>> Chris: Okay. So we're pleased to have here Abram Hindle who is visiting us from -- he's right now a postdoc at UC Davis with Prem Devanbu and Zhendong Su. And he just recently got his Ph.D. from Waterloo working with Mike Godfrey and Rick Holt. So he's going to tell us about evidence-based software process recovery. So take it away, Abram. >> Abram Hindle: Thank you, Chris. So thanks for the introduction. So in this presentation

sacktoysSoftware and s/w Development

Dec 13, 2013 (4 years and 6 months ago)


>> Chris: Okay. So we're pleased to have here Abram Hindle who is visiting us from


right now a postdoc at UC Davis with Prem Devanbu and Zhendong Su. And he just recently
got his Ph.D. from Waterloo working with Mike Godfrey and Rick Holt. So h
e's going to tell us
about evidence
based software process recovery.

So take it away, Abram.

>> Abram Hindle: Thank you, Chris. So thanks for the introduction. So in this presentation
really what we're trying to do is we're trying to take a theore
tical diagram that tries to explain
maybe what software development processes are and what to expect inside of a project such as
this unified process diagram. We're trying to take the theoretical diagram which never was
concrete, never was based on actual

data, and extract it from a real project and see what it
actually looks like based upon real data, based upon things that were recorded.

So when we're talking about process, we're really talking about software development processes,
and these range from

a wide variety of different kinds of ideas what processes are.

So there is the prescribed process. So these would be things like you have to follow test first,
we're following scrum, we have little scrums in the morning, or we use story cards to defin
requirements, things like that.

Many of these prescribed processes are oftentimes based upon formal processes, things like a
waterfall model, scrum, XP, unidentified process, anything like that.

But there's also a whole set of sort of structured beh
aviors which are basically the process of the
project. They're ad hoc processes. These are basically things that maybe as developers we've
come to an agreement that we're going to do, we didn't write it down, the manager maybe didn't
specify it, but it's

sort of this default behavior that we follow.

So the process I'm talking about, basically we cover the whole range here from the formal to the
prescribed to the ad hoc.

So on the formal side, there's actually quite a few. There's the waterfall mode
l which suggests
how development could be staggered, where you have requirements going into design and
analysis going into implementation and eventually deployment.

There's also iterative view, such as the spiral model where basically you repeat this wa
model a couple of times over, so you keep reiterating over what you're doing and you have
basically multiple iterations.

Then the one we're mainly focusing on in this presentation is the unified process. And the
unified process is basically a mo
del where you have multiple disciplines and you take part in
these disciplines in different proportions over time.

So this diagram here, the unified process diagram, this diagram we'll be coming back to where it
shows your disciplines, like your busines
s modeling, and then it shows basically the amount of
effort over the existence of the project for that one discipline. So you can look at, say, one
release and see what the effort was around at that time, the proportion of efforts.

Okay. So the world

that we're really dealing with if we want to actually look at, recover, and
extract these processes, is a world where developers


they want to follow a software
development process but they


in order to do so, they have to exhibit behavior. And this
behavior is exhibited in order to fill a purpose or task which composed the software development

When they do something, when they exhibit a behavior, sometimes they produce evidence. Lots
of time this is lossy. It doesn't contain all the inf
ormation. But this evidence can be used to
suggest the underlying processes, purposes, tasks and behaviors.

So the world that we're dealing with in order to actually recover these processes is where we
have this evidence that was produced from the beha
viors they followed in order to fulfill their
processes, fulfill their purposes, and then we basically have this shadow world which we try to
recover the behavior, recover the purposes and tasks, and use those behaviors and tasks to
compose underlying soft
ware development processes.

And this all comes from the evidence that developers produce. But the evidence is lossy, so we
don't see everything, like if they have a meeting and it's face
face, maybe there's minutes,
maybe there's not, maybe they hav
e a talk in the hall. We don't see everything.

So for the rest of this presentation, I'm going to break it down into basically behavior, intents and
purposes, and software development processes.

And we've got four basically different kinds of resear
ch that we integrated for this work where
we did release patterns where we basically looked at the types of files changed and the reasons
why you would change those files. And so that dealt with behavior and the process, because we
also correlated it with

how they acted around release time, was there a freeze, things like that.

And then we had the large changes study where we looked at the purpose behind large changes
and tried to categorize them by that.

We also used topic analysis which helped desc
ribe the behavior and also helped elicit some of
the intents and purposes, followed by our summary of the processes, the recovered unified
process views. That was mostly about software development processes.

Okay. So who would actually want to see wha
t was going on in a project from a process
standpoint. Well, there's a variety of stakeholders ranging from managers who aren't really
intimately involved in the code base who might not be sure what's going on, they might be
another tier above another man
ager, things like that; programmers who are basically shunted
around between projects and who basically fix messes.

So I got a friend in Victoria, and he's basically one of these. He goes between Java projects, and
he's got to figure out what the proje
ct's about, get the work done, and then get out, because, well,
he's the valuable guy and they don't want to waste him on all the small projects.

But then there's the new developers who are unsure of how a project is actually being done, what
are the pr
otocols and procedures, how do you, say, check in code, things like that.

As well as there's other kinds of stakeholders who are not necessarily very code driven, such as
investors or people trying to acquisition a company. They might be interested in
underlying software development processes were being followed during the development of that
project within that company that they're interested in purchasing.

And as well as there's ISO9000 where you want to document your software, you want to get

certification, it's a big pain, and you don't really have a lot to work with.

So those are the main stakeholders who might be interested in this kind of recovery.

>>: [inaudible]

>> Abram Hindle: Thank you for [inaudible].

>>: Couldn't f
igure out what it was. Thank you.

>> Abram Hindle: Okay. So one example that a manager might have is they might propose a
process. They might propose it as a mixture of workflows over time. And then when they
recover this process, they can go ahead

and actually see if it matches their proposed process.

So they could look at the different


they could look at the similarities and differences between
the processes they recovered and the proposed, and then they can investigate further what those
fferences actually were, why weren't their expectations met. They could just be straight
wrong, but it'd be interesting to know. So that's one potential use.

So how do we get this kind of information, how do we figure out what's going on. Well, we

could ask the developers. We could ask the people inherently involved. But there's issues there.
You might not have access to them. They might not be around anymore. Basically talking to
developers, I think, frankly, annoys many of them, especially i
f it's not really important to them.
And it also takes up a lot of time.

So if you are not going to interview developers a lot, how are you going to get your information.
Well, we could rely on software repositories and the data left behind. We could

try to
summarize this in a kind of unified process manner where we basically break down a lot of the
information events here into the different disciplines such as business modeling, requirements,
design, implementation, and then we could look at how thes
e things change over time and what
events relate to these different disciplines and workflows.

So I'm now going to cover a little bit of previous work and related research to this


to our work.
So in terms of mining software repositories and stochast
ic processes, Israel Herraiz, et al., they
looked at the distributions of, say, metrics over time, things like McCabe Cyclomatic Complexity
and other things, and a lot of code metrics. And they looked at many, many open source projects
and they found a wi
de variety of distributions, most of which were log normal, double Pareto,
things like that which were sort of nasty exponentials. And they also found that many of these
metrics correlated very heavily with lines of code.

There was this laws of softwar
e evolution by Manny Lehman, and then people tried to take his
laws of software evolution and either validate them, such as in Turksi, or invalidate them in some
cases, as in Tu and Godfrey, where in Tu and Godfrey found that


I think it was the ninth la
of software evolution that said that the growth would be sublinear due to complexity. They
found in the Linux kernel, at least due to copy and pasting drivers, that the growth of the Linux
kernel was superlinear. So it was a little bit above linear.

Other work we rely upon is business processes. This is really just another kind of process. It's a
little too formal for, say, information work and development. But we rely on it nonetheless.

So Van der Aalst basically would pose a business process
as either a finite state machine or as a
Petri net where you push a token through, say, a finite state machine.

Then there is


we rely on a wide variety of analysis, a little bit of social network analysis such
as work by Bird [phonetic] et al., as we
ll as statistics, some natural language processing, mostly
at the level of counting words and doing topics. We also use machine learning to do
classification. We rely heavily on

time series as the unified process diagram does as well.

So this work co
mes out of work by Van der Aalst on process mining where he would monitor a
live business process, like buying Chicken McNuggets at McDonald's or setting up insurance
clients, things like that. They would monitor and measure the process, and then they'd f
either as a Petri net or finite state machine.

Cook, et al., took that further and they applied this to software. So they tooled the process, they
modified the process to actually get more information so they could observe it and extract these

Petri nets and finite state machine representations of the process.

So our approach is a little bit different. It's the mining software repositories approach where we
take the information left behind and then we analyze that after the fact without acc
ess to the
process itself, without tooling the process to get more information. And we try to get things like
statistics, the underlying distribution of effort per discipline and other things out of that. So
that's really what process recovery is about.

And, again, what we're going to try to do is we're going to try to summarize the information
extracted from those repositories in a unified process kind of manner, mostly because it's been
used in software engineering textbooks to explain that we do a bu
nch of things at the same time
in software development but we might have different proportions of these disciplines at different
times. So you might not be doing a lot of requirements later on unless you're adding new
features, things like that. Sorry.

>>: Two questions about this. One is was this descriptive or prescriptive when it was created?

>> Abram Hindle: What's "this"?

>>: The model that we're looking at right now would be [inaudible].

>> Abram Hindle: Oh, it's


I think it was de
scriptive, trying to explain what you'd probably

>>: And then how does this handle hierarchical projects? Like Office is made up of six
different projects that are all synced together, and each one might have its own set of phases at
this point a
nd they're all kind of unified [inaudible].

>> Abram Hindle: Well, I think you can have multiple views. So you can do your subprojects
as separate ones of these, and then you can have an aggregate view where this stuff would
probably be thrown away be
cause it no longer syncs up. But at the very least you'd have the
proportion [inaudible]. So I'll get into that later. But I think it can be applied and you don't need
to apply it to the whole thing. It can be applied to subviews. So I'll get into tha

Other work we heavily rely on is the whole mining software repositories field where you
basically mine repositories like version control and other repositories in order to get information
about what was going on in a project at a certain time. And o
ftentimes this research does certain
things like try to predict faults and also expertise, basically, like who would be an expert in a
certain part of the project based upon their past history.

So in this work we rely mainly on three software repositori
es. We rely on discussion and
mailing lists, we rely on bugs in the bug tracker, and we rely on version control systems and the
revisions to source code and other files in those repositories.

So just a quick overview of what mailing list archives are c
onsisting of. So basically mailing
lists are often topic driven. Some are user based, some are development based, some have a
different topic like off
topic discussions. And they're basically discussions between different
people and these discussions oc
cur over time. And these discussions often reference what other
people have said and sometimes reference documents.

They also have a bunch of metadata in the header, and they have big natural language text in the
body. And this stuff can actually refe
rence other things.

So it's oftentimes quite difficult to parse the body because you usually need something like
natural language processing or some kind of way of understanding it.

Followed by that would be bug trackers, which share some similaritie
s. But I'd say the main
difference between, say, a mailing list and a bug tracker is that you have a bug ID, they've named
them. So not only do you have a subject, you've named basically the whole discussion itself with
a bug ID.

And they're sort like

mailing lists, but usually different software, usually a little bit different. But
you still have this discussion between people about what to do, referencing artifacts. And that all
occurs over time.

Then we had the version control system where we h
ave authors over time making commits to the
code base. These commits are composed of revisions. These revisions basically are changes to
separate kinds of files, like build or configuration scripts, sometimes documentation, sometimes
the actual test. Of
tentimes source code.

So those are the three main repositories we rely upon to extract information, at least in this study.
You don't


if you had a documentation repository, you had any other information, might be

So other work we heavily
rely upon are source code metrics, whether they're straight source
code, whether they're evolution metrics, such as like information about the deltas or, say,
coupling metrics where you measure how much files change together.

Other work we heavily rely
on is topic and concept analysis. So Poshyvanik and Marcus heavily
used LSI and somewhat LDA to figure out what entities are associated with certain concepts. So
a lot of this was unsupervised and automatic where these topics


where they're extracted f
source code or natural language text would be extracted from the repository.

And then others such as Lukins, Linstead and Maletic and Hindle would actually use LDA to
apply it to natural language text, whether it was in the version control commit co
mments or it
was in the bug repositories.

So Lukins actually had an interesting paper where they would use LDA on the bug tracker in
order to find


in order to query it for template bugs. So you provide a template of what your
bug sort of looks like,

and then you ask the bug tracker and it comes up with a similar document.
So it was document retrieval.

Okay. Other stuff we also rely upon is quality
related nonfunction requirements. So
Huang has published a lot on mining NFRs from source
code and requirements
documents. And Ernst, Neil Ernst, has also published on just basically mining these
related nonfunction requirements from mailing list histories and version control histories.

Okay. So that was a lot of the work that we r
ely upon in this. So sorry about its length, but let's
get down to the brass tacks of software process recovery itself.

So we rely on subsignals for software process recovery. We rely on information extracted from
version control systems and things lik
e this. In this case of release patterns, what we do is we
take the revisions of the version control system over time and we basically partition by file type.

So if it's a change to a source code, we suggest it's a source code or implementation revisio
n. If
it's change to your benchmarks, to your unit tests, anything like that, we say it's a test revision. If
it's change to your build files, Automake, Autoconf and your project files, it's a build change.
And if it's a change to your user documentatio
n or your developer documentation, it's recorded in
the repository, we suggest as a documentation change.

Now, this doesn't sound that useful, but it actually is useful once you aggregate it in a large


especially with respect to events like a release.

So what we found was then certain projects, like, say, MySQL, if you looked at these signals
across release time, you could get a general behavior. That behavior was actually consistent
across the release types. So minor releases in MySQL would look
the same. They'd have the
same kind of behavior around release time for a source code. They might have a lot of changes
and then it would taper off after.

Where something like PostgreSQL did more of a freeze, where they would have no real source
changes, maybe a bunch of test changes and minor build changes before a release, then
afterwards they'd have a huge spike in source code changes because they'd integrate all that cool
stuff they were working on, which they couldn't have integrated before t
he freeze.

So you could see some kind of behaviors, especially process
related behaviors, based upon these
four signals in a very simple manner.

>>: When you just mentioned the story about Postgre, how do you


was that story derived
straight from t
hese kinds of diagrams, or are you using some of your own knowledge about how
software development works [inaudible] to embellish that story? Is that rounded or is that


>> Abram Hindle: So it was based upon the data, but also in order to see it, I g
uess you got to
know [inaudible] occurred. So I saw it, I thought [inaudible].

>>: But you didn't ask that


>> Abram Hindle: No, I didn't ask that.

>>: [inaudible]

>> Abram Hindle: No, I didn't ask that. So the usefulness of this is that
it's just relatively
simple. It's basically partitioning by file type, and you get these four different signals suggesting
what kind of behavior is happening. And you can do neat things like look at the correlation
between the tests and source code. You

could ask things like are we doing tests first, things like

>>: Did you try any other splits to see whether that gave you more interesting signals, or are
these the four that were the best ones?

>> Abram Hindle: In this study, we only did th
ese four. But if you have


and this was on file
type. So you could also split on author, because authors are pretty heavily loaded, especially in
open source properties where you have the top three authors are really responsible for
everything. So if
you would subsplit these by author, it might give you more information.

>>: How do you differentiate between source code and test code?

>> Abram Hindle: Well, these don't have to be straight
up partitions. They can overlap. But
you can just say a
nything that's test code is not source code. It's up to


>>: I guess my


how do you identify test code?

>> Abram Hindle: Oh. Okay. So in Perl you look for .T files. The quick way, the dirty way,
which requires no supervision, it's dangerous,

is look for test. But in, say, something like a
database system, there's a lot of things which are test that are not test code, and especially in, say,
a package like R, there would be a lot of cases where tests would be actual code.

So you got to be
a little bit careful with test code, and mostly you're relying on the idioms that
the programmers use to identify what parts of system are test.

>>: So it's pretty project specific.

>> Abram Hindle: And language specific. So basically for this I h
ad a bunch of
specific test [inaudible]. So like .T for Perl, benchmark, and there's a few others. So
the problem with tests is there's also different kinds of testing [inaudible] like regression tests,
and benchmarks would still be considered t
ests by a lot of people.

And in a database system, benchmarks are really important. So if you've got a
oriented project, you might have to get more specific with the tests.

>>: Have you ever partitioned data into like the different type
s of tests to see if there's any
different [inaudible] like unit tests versus [inaudible]?

>> Abram Hindle: No. You're making me feel stupid, because that sounds like a great idea.

So this is just a quick example of applying to SQLite over time. S
o from 2001 to 2010. You can
see there's lots of source code revisions. There's a bunch of test revisions. There's hardly any
documentation revisions. And there's a couple build revisions. So this is just a concrete view of
the source test build docum
entation revisions.

The next thing we did was a large


a study of the large changes in version control systems, and
we basically categorized them with the three Swanson maintenance classifications, so we
manually looked at them across I think 18 open
source projects.

>>: How do you define a large change?

>> Abram Hindle: It was top 1 percent in size. So size of lines changed.

What was interesting was for the large changes, the vast majority weren't really that Swanson
orientated. They were
n't really maintenance orientated. They were implementation. So some
people would suggest an implementation would belong in adaptive. But these were explicitly
implementation, like lots of times larger merges from another project, a totally new feature,

things like that.

We also found that while the Swanson maintenance classifications weren't really version control
specific, we were dealing with version control. So we had to deal with things like copyright
changes, legal changes, comment changes, thi
ngs like that, stuff that never would change the
execution of the code but existed nonetheless.

And so we applied it to many projects. And not all projects were consistent. I guess relevant to
here, the Samba project, which is basically Linux version
of the Windows network filesystem,
they had a ton of adaptive changes because they had to adapt a lot to anything that changed in the
Windows filesystem. So some are more consistent. Like Firebird, which is a database system,
was pretty consistent across


>>: So does Evolution have no bugs?

>> Abram Hindle: Evolution.

>>: There's no corrective.

>> Abram Hindle: No big bugs. No like hundred
line bugs. Yep.

>>: So what do you do with this? It's pretty.

>> Abram Hindle:

>>: Like what's


so at the beginning you talked about applications, but you


what's the
application for this diagram?

>> Abram Hindle: Oh, this diagram is to show you sort of what exists in the open source world.
So we took the previous in
formation, the manual stuff, and we checked to see if we could apply
machine learners to automatically classify the changes.

>>: What's the user


who's the user that's looking at this and what is their need?

>> Abram Hindle: Okay. So the first u
ser would be the researcher who learns they shouldn't
throw away the big outliers, because big outliers can change architecture. So that's important.

The second user would be more of an end user with the previous data where this data would be
used to t
rain the learner which would automatically classify their changes.

>>: [inaudible]

>> Abram Hindle: A machine learner.

>>: What about a human?

>> Abram Hindle: Well, they'd at least get to see an overview of what the changes were thought
be. So you could use the learner to tag a change with, say, adaptive or corrective. And before
they look at the change, they already see it's been tagged with adaptive corrective. This would
allow querying, allow them to scroll through changes and decid
e I only want to see the bug fixes
or I want to see the [inaudible] changes or what were the last set of license [inaudible].

>>: How difficult is it to train a learner to do that? Seems like deciding between whether the
change is like adaptive [inaudi
ble] tough.

>> Abram Hindle: So we did single classes here. We learned the hard lesson that this is
software, categorization is not so hard and fast. So you want to use a multilabeled one. So that
was the things we really learned.

>>: What featu
res did you


>> Abram Hindle: Oh. The features were


they were actually really interesting. They were
file type, author, the text in the change commit. I think that was about it. And what we found
was that you could throw away all the files chan
ged and keep only the author, or you could keep
all the files changed and throw away the author.

There was so heavy a correlation in shared information between those two that you could choose
one or the other. The author was very, very important to det
ermining what this was, which
suggests that in some projects


>>: [inaudible] with the projects?

>> Abram Hindle: Yeah. In some projects that certain authors wear a few hats.

>>: How do you validate the training for your learner? I mean, how

did you validate that? You
found those labels, right?

>> Abram Hindle: Oh. We couldn't really validate too well that we got the labels right in the
manual labeling where we went through and we labeled. So me and Daniel German did look at
each other
's labeling. We didn't ask him. We didn't go to developers. We didn't ask them.

>>: [inaudible] randomly sample or something for the automatic labeling?

>> Abram Hindle: Yeah. So we labeled a bunch. And then we trained the learners. And then
we tried to learners using [inaudible] validation to see how well they did against each other. And
they didn't do super great. They were like area under the RFC curve, like .6 to .8 depending on
the project . And that converts to a [inaudible] score abo


I guess it's easier to think about the
letter grades. So an 80 is okay whereas .6 would be like a [inaudible], whereas .5


>>: Maybe in Canada. It's a D here.

>> Abram Hindle: It's a D? Wow, you guys are tough. Okay.

So then the other

work we used were developer topics. And so what the developer topics were is
we took the change log comments and we pushed them through an unsupervised topic analysis
engine, like LSI or LDA, and then we got the topics out. So we say LDA, give us 20 top
ics for
this input text, and it gives us 20 topics. And these topics are basically word distributions. So
basically counts of words, which isn't really that useful.

So what we did is we applied it per month, and then we looked to see if any of the top
reoccurred in the consecutive months. And what we found was most topics don't reoccur. About
80 percent don't reoccur. They're very specific to that month. Sometimes the topics would
mention even a bug number, and it would be a bunch of documents,
a bunch of changes related
to that one bug, but it wouldn't occur in the next month.

Whereas there was some topics which occurred across time. And we looked at these big long
topics because they were sort of interesting. And we found that they seem to

deal with


Did you ask


>>: [inaudible]

>> Abram Hindle: Yes.

>>: Can you explain the colors?

>> Abram Hindle: Okay. So gray is never repeat. Not gray is does repeat.

>>: Okay. And what about the boxes?

>> Abram Hindle:

Each box is a topic. And if you had a PDF viewer, you could zoom in and
you could see the words. The top ten words in that topic embedded in there. So it's technically
like a zoomable graphic.

So this sort of illustrates how fundamentally annoying
the output from, say, LSR or LDA is.
Because these are the top ten words. It will give you many more words.

>>: So how do you identify


like you've got this big brown box at the top [inaudible] I doubt it
was the same ten words every [inaudible].

>> Abram Hindle: No, it wasn't the same words. So we didn't have a threshold like they did.

>>: Oh, okay.

>>: Why are some boxes lighter than others?

>> Abram Hindle: Like this box?

>>: Yeah, or the boxes in the lighter brown, some are n
arrower and some are wider than the
gray boxes, let's say, or


>> Abram Hindle: Because if they


if the topic occurs in the next time window, we join the
boxes. So this box occurs from 2004 July to 2006 March.

>>: Oh, so that's why the big box.

>> Abram Hindle: Yeah. So that's a topic that spanned a long time.

>>: So you're seeing a lot more words for that topic than you are for any of the gray topics.

>> Abram Hindle: Well, it's a lot of topics joined together over time. So these a
re topics that
were similar to each other joined together. And I think this one was correctness orientated. So a
lot of the words dealt with bugs and bug fixing and fixes.

And so based upon that observation, we felt, well, this diagram in itself is re
ally not that useful
until we interpret the topics, right? Like right now this is just some giant matrix, right? It's not
really that fun.

And so what we tried to do is we tried to label the topics. And we had one interesting method
which was unsuper
vised where we provided a dictionary of software engineering terms related to
nonfunctional requirements like portability or, say, reliability, and then if a topic contained any
of these terms, we just labeled it with the concept, portability or reliabilit
y. And we had five
NFRs that we labeled the topics with.

And this allowed us to produce a similar diagram but with labels. So this one was
maintainability, that one was portability, and there also was


there were topics that dealt with
more than one
issue. So they'll be maintainability and portability, things like that.

>>: Did you see anything that came up relative


like periodically relative to the release cycle,
anything like that?

>> Abram Hindle: Not specifically that I can remember.
But what was interesting was that lots
of the repeating topics were actually related to the nonfunctional requirements. So they were
issues that cross
cut a lot of other projects, issues like performance, maintainability, portability,
functionality, effic

And by using a very simple dictionary based upon mining ISO9126, which is software quality
standards something or other, I don't remember, they had a bunch of words in there. We stole
those words, put them into this dictionaries, threw it at thi
s, worked out.

Then we also tried, well, let's use WordNet. WordNet was interesting. It had similar
performance, but WordNet would include neat little words. So for things like efficiency, it'd
include theater. The reason why the WordNet would inclu
de theater is because performance and
theater go together in the English language, but for software, it's not really that meaningful.

So we thought it'd be really nice having like a software engineering word in that where it was
more domain specific, an
d then underneath that having a domain
specific WordNet would be
cool too, like for databases.

>>: Did the authors of these topics correlate similarly to the previous study where you were
looking at who made similar types of changes, large changes to t
he project where you said you
could probably just save that guy if he makes [inaudible] could you do the same thing, this guy,
he always talks about efficiency?

>> Abram Hindle: So we didn't do that, but it'd be pretty simple to go through the file jus
t for
authors corresponding [inaudible]. That's a good idea.

Okay. So we had the labeled topics which would


what was also neat is these topics are related
to documents. So when we get this topic, it's back
related to the documents.

So basically

what LDA tried to do is they tried to say, hey, look, you can compress these
documents you gave me by these mixture models of topics and via that you also know which
documents relate to which topics, thus you know, given this topic, what documents are rel
So you can use that to tag documents as well.

Okay. So we did a bunch of work which didn't really seem all that coherent. But we had to
string it together. So what we really tried to do was we tried to take all that previous work and
we tried
to integrate it in order to take this theoretical diagram of what software process was and
produce a practical version of it where we took those previous signals, those previous events and
information that are tagged, and we produce a practical view of it
based on aggregating those.

So as an example, we had the unified process requirement signal. So we had a
based word bag. So basically that dictionaries that I mentioned before, we grepped
through three repositories: version control, bugs
, and mailing list. And then we also looked for
the NFRs we were able to grab, such as usability and functionality.

So this was open source code. We weren't really sure where requirements were discussed for the
most part. In many cases open source pr
ojects don't really have a lot of requirements other than
clone that other project. So sometimes a requirement's already implicit. Yeah. And sometimes
they have external requirements documents.

So we suggested that the UP requirements view, the requi
rements signal would be this mixture
model of these. And in this case we just have coefficients of 1, so it's a summation.

So what this is is it's basically the events that are related to requirements over time, pulled from
three repositories. And we
haven't done any kind of real mixing of it other than submitting the

And I'm not saying this is it hard and fast. I think if you had a project and you knew that, say,
you actually had a documentation repository where requirements documentation
was or you had
story cards and you had a signal where you knew how many story cards you had over time
created, removed, things like that, well, you'd want to include that in the requirements signal.

So we just


we had to do something that would


tried to produce something that would
look like the unified process model on something that wasn't necessarily unified process.
Because our purpose with using the unified process diagram was to communicate, was at least a
first step to show that, well, th
is could be done in a certain manner.

>>: So there's [inaudible] 2001.

>> Abram Hindle: Yeah.

>>: What does that mean? What happened? What does that mean?

>> Abram Hindle: Okay. I think this one is FreeBSD. And in 2001 I grepped around
because I
was worried about that spike. Because if it's not requirements related, it's relevant. So there
actually was requirements
related events occurring at the time. And one of them was they were
trying to meet up with Single UNIX Specification, Ver
sion 2. So they were trying to conform to
that external requirements document. They were mentioning that in the version control.

Another reason requirements got ticked up was in terms of one of the requirements words I think
was definition. And they w
ere converting to GCC 2.96 at the time. And they mentioned that
they were changing function definitions. So that's probably the majority of the peak. But there
was requirements
related stuff there.

So the design signal looks pretty well similar to th
is one, and at least it peaks up in the design
signal. So this isn't necessarily very accurate.

>>: So you've got a lot of like external knowledge about these. Does that just come from you
follow these projects, you're aware, or was


like let's say

that I didn't know anything, and I'm
like, dude, there's this spike there. What would I do to try to figure out what actually was going

>> Abram Hindle: [inaudible] to find what was going on in the spike. I use AWK. And I said
AWK, on these fil
es, these CSV files, between these two dates, grab me those, and then I use
grep and I told grep here's my requirements document or my requirements dictionary, grep
anything that matches that. So [inaudible] and then I looked at what was there.

>>: Ok
ay. So you could do it without external


>> Abram Hindle: Yeah. So if like [inaudible] user interface [inaudible] then yes.

>>: Okay.

>>: It seems that when the project are smaller [inaudible] 1994 and 1995 a spike might still
exist, whereas

in 2001, if you show it


like do you normalize per quarter [inaudible]?

>> Abram Hindle: No, I didn't. And I mention it in the paper it's based on


it's something you
might want to do. Because, you're right, there's very little here and there's
a heck of a lot over
here. And in terms of version control, it's pretty well [inaudible]. There's a ton of work over
here and very little done over here.

So you do want to


if you're looking at a specific time period, you might want to normalize for

that time period. And you might also want to normalize this up based on size.

>>: Would you normalize [inaudible] individual components first or would you normalize
[inaudible]? Because each question might be


you might want to normalize versus th
e number
of [inaudible] versus the number of people who ever looked at that [inaudible].

>> Abram Hindle: Yeah. So there's definitely multiple ways of doing the normalization. And
so I don't have any hard and fast [inaudible].

So if I was to give t
his to an end user, like a manager, they'd have all these coefficients which
they could fiddle with and potentially see what they want to see but also potentially see what's
there. So there is that balance. Because



>>: [inaudible] talked abo
ut these kind of things matching up to the release cycle [inaudible] at
these signals [inaudible]?

>> Abram Hindle: No, I didn't correlate with [inaudible]. I basically


for validating this, it was
two case studies, FreeBSD and SQLite, which we'll g
et into. And basically I was looking at
mostly the peaks. So I didn't do negative validation [inaudible] it was mostly just two case
studies looking at are the large behaviors visible.

So there's definitely a lot more validation work to be done on thi
s. But I think what was neat
about it was that we showed that we could try to derive some kind of process view out of the
events that occurred, and I don't think unified process is really all that valid for every project,
especially a lot of the open sour
ce projects. You'd probably want to show some more concrete
signals as well. Like I think the build signal is very important for C project, because every time
you add a .C file, well, you're probably going to change the make file or the [inaudible] file
one of those things.

>>: How do you choose the projects that you decided to look at?

>> Abram Hindle: Ad hoc. So I had FreeBSD because I had done the mining challenge, and it
was a long
lived project. And SQLite was long lived as well. And I h
ad just written a fossil
extractor, so I was the only guy who actually had most of SQLite's information because no one
else had done the fossil extractor.

So it was basically two case studies of long
lived projects which were popular. So FreeBSD is
ular and SQLite's popular.

>>: If you had to


let's say that like you could snap your fingers right now and get the data for
another open source project, what would be one that, given what you know about them, would be
a good one to look at? Do you
think like [inaudible]?

>> Abram Hindle: I'm not going to answer unified process one, but I'd use Apache, would be the
next step. Because


>>: Because everybody looks at Apache?

>> Abram Hindle: Well, because Apache is very explicit about wha
t their process is. And if I'm
doing process validation, obviously the next step is compare


we said we did this with what we
did. So like this stuff is pretty well baby steps towards the real software [inaudible] needs a lot
of validation. A lot of p
eople aren't really up to this point. So this is what I got a thesis out of.

Okay. I'll carry on. The implementation signal is much more concrete. I took the source code
changes and I said they were implementation changes. They might have been main
changes, but at the very least it was a very concrete signal, very direct.

The testing signal is more interesting because I took the testing changes along with the
portability changes, which we can argue about, and the efficiency changes because

it might be
benchmarking regression tests. Especially something like FreeBSD or SQLite where they care
about performance. And I also took reliability changes because a lot of those sometimes when
you do a fix you might do a test.

So this is the most
concrete of the signals and these ones I would say have less power but they
might be relevant in terms of regression tests and performance testing. And this produces the UP
testing signal.

>>: [inaudible] how would you write error bars on these things

in the sense of you have some


so you're doing several different kinds of analysis that each have their own possibility for error
[inaudible] and merging them together, is there a sense that you would actually


it would be
[inaudible] more clear if yo
u drew like a band of what the trend could be over time as opposed to
these individual spikes which actually probably are attenuated based on error.

>> Abram Hindle: I don't think you can truly get error until you have a concrete view of what's
going o
n. Oracle can do everything, then you know. But if you don't know, then sort of hard to
tell an error. So in that case I think you rely on the confidence of those people who have
expertise in some of these signals. So it's sort of more accumulative.

So that's a big problem with it is how to view it and how to display it and how to analyze it.

>>: It's interesting. Your source signals have pretty high standard deviations, but then the
summation of them seems to have [inaudible].

>> Abram Hindl
e: Yeah.

>>: And it's interesting to me that they are not mutually supporting; that what that's suggesting
is there is a not high degree of correlation between those four source signals.

>> Abram Hindle: Yeah. If we go back to the topics, we coul
d see that certain topics are
prevalent over different periods. So I think this has to do a lot with how software isn't about
everything at once; it's about what you're focusing on at one time. At least if you look at one
slice, what are we doing right n
ow, we're not doing everything, we've chosen to do a couple little
things. So that might be a kind of topic shift.

>>: Right. Which kind of makes you wonder [inaudible]. Yeah. It shows you pick signals that
are highly correlated or should you pick s
ignals that are deliberately not correlated at all. So
you're getting some sort of


>>: And then you get a flat line [inaudible] which is by definition interesting.

>> Abram Hindle: You don't have to totally smoosh. You can look at these signals

and then you
can look at the signals they're derived from. So I believe like the build signal is a really great
example of something that has a lot of interesting information in it for certain kinds of projects,
especially C projects. Because they


tentimes the build change would indicate an
architectural change.

>>: Yeah, I guess that comes back to your question about [inaudible] there's no story here.
There's not


it doesn't feel


certainly with these [inaudible] there's not


there's no
happening. Maybe that comes back to what Chris was saying earlier, you've got to be deeply
contextualized on this step to


for these [inaudible]. And I don't know if that's a presentation
problem or


>> Abram Hindle: Well, I don't have

the releases [inaudible].

>>: [inaudible] test it, let's say, you took a release manager or a testing manager and you gave
that in the chart for their project [inaudible].

>>: Right. Exactly.

>>: And how accurate are they or how knowledgeable
are they without having


>> Abram Hindle: Yeah.

>>: And maybe there's another question of like why is it interesting to reflect on a decade and a
half of the history of those projects? What do we expect to learn from that?

>> Abram Hindle: Do
esn't need to be the whole project.

>>: But it is. I mean, you're showing me the whole project.

>> Abram Hindle: That's true.

>>: It is what it is. So you're showing me this picture for a reason. And what is that reason?

>> Abram Hindle:

Well, I needed a way to express what was potentially the underlying software
development process, a view of it [inaudible]. And so if we look at the original UP diagram, it


the UP diagram has what whole lifetime on there. They've got the inc
eption and then
they've got


what's the last phase when you peter out


>>: Transition?

>> Abram Hindle: Yeah, transition. So they have that in there. So they've got the whole
lifetime. So this was first step in trying to show, well, what woul
d the lifetime of [inaudible].

>>: [inaudible]

>> Abram Hindle: So there's definitely a lot of issues with it, and there's a lot more validation to
be done and a lot more investigation in like how to actually show it as well as in future work I
ioned, I really want to see if iterations are automatically identifiable or what adding the
iteration bars would really tell you.

Okay. So I already discussed the FreeBSD stuff. Sorry.

>>: It seems like there's another problem with smooshing these
things together like the


if you
have a particular issue, presumably a feature, like it's this fast to log on


gosh, I wish we had
this particular feature in [inaudible]. Presumably that's coming up on user lists, then the
developers are doing a whole

bunch of [inaudible] and then the user list is going back on the user
[inaudible] for a particular issue, that's going to kind of move through the pipeline


>> Abram Hindle: Yeah. But at least with the smooshing you get to see, you know, in this
ository [inaudible] in repository 2 it was visible, repository 3 it was visible. So it might look
flat even though in repository 3 it might look


so it's a multirepository view.

So it's already multidimensional signal in total, so how do we show that
. We could do other stuff
[inaudible] but I don't think that's as valid as, say, asking people how much they trust a signal
and how much they want to see it. Like do you really want to see the user list? Maybe you do.
Maybe it's very important. Maybe
actually a lot of dev work [inaudible] into one user list. Like
for SQLite, these developers don't really let a lot of people onto the dev list.

So I'd already explained this before when Rob asked about the peak. And so just to reiterate, it
was the G
CC 2.95 port and it was the single user UNIX Specification, Version 2, conformance.
And so that caused this peak in analysis and requirements. At least that's what I grepped out of
there. And I'm


wasn't really sure about testing.

So we applied thi
s to SQLite as well. And we looked at the interesting peak at the end which
was across quite a few things. And so we see this big requirements peak here. If we look over
here, the configuration management peak is different. The testing peak is differen
t. So it's not
necessarily the same event.

So in terms of requirements, this was really interesting. They want to their .H files, and they had
requirements jammed into the .H file comments. What they did here is they actually took those
out and they made a formal requirements document. Very rare in open source and
also sort of strange seeing as 2001 was over here, and this occurred in 2009.

So later I went and looked and looked for why would you do this. And they basically wanted to
have a requirements document where anyone could reimplement SQLite, even though it is public
domain software. So not even on open source license, it's public domain. Like you're free to
take it and no attribution.

But what was interesting was, yeah, t
hat was noticed, the requirements things was noticed. And
there was also another requirements peak around here which was interesting, which was the
SQLite 3 discussion where they were referencing SQL books to look at for, say, implementations
of B
things like that.

So at least in terms of some of the peaks, they panned out in terms of what was in the
repositories, and that was validated by AWK and grep. Actually, grep's really nice in terms of
ignoring schemas. So you can basically grep across
data without the schema getting in the way.
So that actually have some value. So does SQLite.

So this led to issues of observability. So we discussed the requirements, we discussed the
weightings, we discussed things like that. Certain signals were
not as observable, particularly
business modeling. In open source sense, not a lot of projects have business modeling. Maybe
Evolution had a little bit at the start in their mailing list, the mail client, but that was about it. So
not everything was obs
ervable, and that was one of the issues.

So some common threads we observed while going through all this stuff was there was idioms
and we could rely on idioms, whether they're file naming, other kinds of naming, behaviors, use
of different kinds of fil
es. And this also related to sort of a vocabulary within the project. A
kind of lexicon used internally to a project which actually had little shared vocabulary.

But of those shared vocabulary we found that many of the shared terms seemed to relate to

nonfunctional requirements, like usability, maintainability, portability, and especially reliability
and correctness.

>>: Did you


could you identify these idioms automatically?

>> Abram Hindle: I don't know. Maybe.

>>: It seems like


I m
ean, each [inaudible] I look at has their own idioms for a number of
different I guess dimensions. So I bet it'd be useful for like maybe a newcomer to a project or
something like that just trying to


having gone through this just recently, for like, yo
u know,
joining scripts or something within a project, understanding the idioms are seeing examples or
saying like this is prevalent [inaudible] would be


>> Abram Hindle: I think language management is sort of the next sort of big software
g tool. So let's try to fortify, you know, this word means this and we're going to use
this on all of our clients as well as the domain modeling.

>>: Yeah. Or the reverse. Like I've encountered problems where I know what word I used for a
concept an
d I'm trying to figure out what word is used for that concept somewhere else. It can
be really tough [inaudible] like understanding how to [inaudible] concepts amount to stuff I've
seen in the open source world is really difficult.

>> Abram Hindle: So

for future work we want to apply more people in the teams
analysis. So imagine doing the RUPVs per author, how would those change the unified process
views we extract, how would those change per author, things like that.

We also have to do
a ton of validation work, some of which requires harassing people and asking
them is this really what happened. We also want to improve the accuracy of some of these
things and maybe do an additional case study.

We also want to look into iteration iden
tification. So basically I've done some machine learning
in the past trying to figure out is this release time or not. And it didn't really pan out that well.
But with this new source of data, it might pan out.

>>: I have a question. Not to be too
down on it, but I see the issue identifying releases as
[inaudible] but it's unclear to me what the real benefit of that is. Because if it's in real time, you
can like [inaudible] and if it's retrospective, you can get super high accuracy. The releases a
that frequent. You can just go into the project page or something like that. So is it really worth
putting a lot into identifying releases?

>> Abram Hindle: Maybe not releases. But the phases within a release I think. If you can say
the certa
in phase is a linear combination of certain disciplines or certain signals you extracted,
then you can suggest how much a certain window is, how much a certain time that


how much
a certain window of development is related to that phase. So are we in a
freeze phase, are we in
a testing phase, are we in a heavy implementation phase, things like that. Are we crystallizing.
So those kinds of phases would be [inaudible].

>>: How much of all this could be replaced with an anthropologist hired by let's s
ay the team
manager to sit there and shout it out, get all this information [inaudible] what's the tradeoff here?

>> Abram Hindle: Well, I think definitely some of these tools would be used by the
anthropologist to keep [inaudible] what's going on.

>>: [inaudible]

>>: No, no, no, very good point [inaudible].

>>: [inaudible] anthropologists, that's their whole job.

>> Abram Hindle: [inaudible]

>>: [inaudible]

>> Abram Hindle: But if you're, say, purchasing the company who didn't the a
[inaudible]. A lot of this stuff is after the fact. If you're going to change the process [inaudible]
anthropologist [inaudible] I definitely think some


I think it's managing language, having at
least the language half in terms of a shar
ed lexicon, shared terms [inaudible] things like that. I
think that's [inaudible].

>>: The hiring [inaudible] company that didn't have the anthropologist, then you need the
archaeologist, but the archaeologist [inaudible].

>>: I would also say the
re's two other problems with anthropology which is all [inaudible]
which is anthropologist A [inaudible] plus they also, I don't know, I think they spend like 20
years or something [inaudible] so you probably don't want to wait 20 years when you have 15
ars of [inaudible] but if back in 1995 someone wanted to know what was going on [inaudible]
say, well, [inaudible]. That might be a problem as well.

>>: [inaudible] you have an anthropologist for two weeks of time for something. Could they
help you i
dentify the relevant topics ahead of time or identify the [inaudible]?

>> Abram Hindle: One of the reasons I was doing the topic analysis was because of yak
shaving. So say you have a story card that you have to fulfill and it causes, say, a performan
regression but that wasn't a story card but you've got to put out that fire, so you've gone on this
long journey to fulfill this story card and you end up over there shaving a yak for some reason.

So the topic analysis I was hoping would maybe highlig
ht what's going on if there's yak shaving
involved, what were some topics that occurred that were important that maybe the manager
wouldn't know about because why didn't you guys finish that story card last week, we're working
on this [inaudible] different


Okay. Just a quick summary of the process recovery. So we've got our repositories that we
relied on at least for this study, our discussion lists, our bugs in bug tracker, our version control
systems, and we applied a wide variety of analysis

related from NFR
related word lists, from
maintenance classes, topic analysis, release patterns, all that, and then we aggregated lots of
those signals up into a


what we called the recovered unified process views, which was
basically a concrete version

unified process diagram.

And this stuff was relevant to a wide variety of stakeholders, mostly those not inherently
involved with the code or those who are just beginning in a project and not really sure what's
going on.

Now, I want to go a little b
it further about this. This is like the awful selling point. So how do I
think this is relevant to Microsoft. Well, I think there's three main points. So internally for use,
integration in existing products and how existing products would actually help

in the future help
this kind of analysis.

So for internal processes, I think one of the issues is when you have the globally distributed
development you want to keep tabs on things, you don't always have your manager in the same
location, things like t
hat. You might have the proposed process, and at least maybe you can
validate what the underlying process was and what the differences were.

So at least in terms of Project Dashboard or project timeline, some of this stuff might help, and I
don't doubt
that you have some of it.

Another thing would be to look at, say, a successful project and try to see is there actually a
correlation between the successful projects in terms of their processes or teams and look at that
and see if there's any kind of co
nsistency across there.

>>: I have a question.

>> Abram Hindle: Yeah.

>>: In terms of studying the software process, is there a relationship to how MBAs study
business processes and are there lessons to be learned from management schools on what

processes look like or whether they correlate?

>> Abram Hindle: Okay. So I was reading Van der Aalst's book on workflow processes where
basically he wanted to mine the processes and then refactor them. And he used the formalism of
Petri net [in
audible]. So it's trying to look at how that's applied, and Cook applied that, but it
required a lot of work and a lot of tooling.

So I think it is possible to use some of the business process work, but a lot of the business
process work is very, very
fine grained and would probably only be appropriate to software in
very small domains, at least in my opinion, like the bug tracker.

The bug tracker oftentimes imposes a process. So that's one side effect of a software. So you
could look at that proce
ss and you could use the business process for factoring in order to
improve, well, how can we get nicer bug reports, how can we get people who report bugs by not
[inaudible] things like that.

So I think in some subsets, in some contexts, we can use busi
ness process stuff for software. But
in many cases the very information work kind of thing, so we don't really have the very strict
states where we can just pass the document on.

I guess one thing to look into would be is there any way to avoid the exp
ertise issue where you
have nonexpert programmers and expert programmers. Because a lot of the business process,
refactoring is about parallelizing the work and moving the documents between different people
who might not have a lot of expertise and what's

necessary there.

So maybe some of that. But I didn't really go into that in my stuff. And I didn't really feel that
the business processes were all that appropriate to, say, modeling an open source system. Maybe
some of the Apache stuff where they c
laim they have very strict processes, but not in all cases.
And I think because we're fundamentally information work while developing, we don't really
have this clear kind of staggered stage
stage kind of development.

So there's also various product
s that might be amenable to having some of the software process
recovery integrated into them, whether for Microsoft or for their end users.

So Project Dashboard in Visual Studio 2010 might help. Some signals might be useful, like the
STBD stuff, the s
ource test build documentation stuff. It might be useful just to see is your
current release similar to your last release. If they're not similar, well, why aren't they similar.
You can go and look.

There's also the word bag analysis stuff which is v
ery cheap to apply, relatively inaccurate, but
still very cheap, once you've got a dictionary you can ship to other people.

And then the topic analysis is sort of interesting, but it eventually becomes a supervised method
if you really want very well
beled topics. So that might be a little bit hard.

And then the harder stuff to apply would be anything that requires end
user interaction for
training our words. And I don't think that would really pan out and go very far with end users
because, well,

I bet everyone in this room has done annotation, and I don't think anyone here
likes it.

Okay. So these were a couple things that could be integrated in, say, Project Dashboard. Project
Dashboard has some interesting stuff in it that I really like, l
ike the burndown charts, the burn
rates, backlogs. These are things which I'd want as signals when I'm looking at or aggregating to
produce the recovered unified process views. Those are really nice signals to have. Because
those are very process orient
ed. They're usually like requirements or story card based, what's
getting done. I think that's really interesting. And it'd be neat to have it flow both ways, get
access to that and vice versa.

So Codebook is interesting, and I think some of the stuf
f could go in Codebook. So imagine like
a social view of the recovery unified process views. So you're looking at one project, you're
looking at a couple people involved who you're connected with, what does your aggregate
network look like or what does o
ne of your buddies look like, or has your one buddy been mostly
working in portability related issues, things like that, that kind of analysis. So that might be
really neat.

>>: If you have a smaller corpus, like [inaudible] produced by an individual,

then does this
approach scale down like that? Or do you need big bags of words to work on?

>> Abram Hindle: For like the [inaudible]? Well, with at least the NFRs we have shared
terminology, at least in English. So I think with the NFRs you can get

away with them


cross projects and cross individual [inaudible] so you can get away with a lot of training on

>>: But, I mean, you'd end up with just little tiny spikes [inaudible] a whole lot of zero baseline.
The further you go
down from


getting smaller and smaller in terms of people or in smaller and
smaller windows of time, in both cases your bags of words could get small and the signals are
going to just become very spotty.

>>: You could imagine [inaudible] solution app
roach to a stack [inaudible]?

>>: You could.

>>: You could see if there is a topic that moves across people or there's a spike from one person
[inaudible] from the other people like after noise, they'll all stack up and you can kind of see that
ryone got involved. Actually this one person got involved, maybe there was like another
[inaudible] that turned out to be really significant and everyone had [inaudible].

>> Abram Hindle: Yeah. So I think you could get some really neat interaction gr
aphs out of
this. And like even this stuff, like seeing who's working on implementation immediately, who's
actually doing testing, who's committing build changes. Like this stuff is totally unsupervised
and very easy to apply. So I think that would be p
retty neat. And I think you're right about the
word bag. Maybe some people don't use certain terms. But you could also


>>: [inaudible]

>> Abram Hindle: Anyway, so I'm just saying that there's potential for interaction [inaudible]
the recovery u
nified process. And at least in the social setting, I think it'd be really interesting,
especially person specific and group specific.

>>: Okay. Here's the [inaudible] if Chris and I work on internationalization, localization of all
the [inaudible],
that's our job and that's what we do together, we will never say those words.

>>: Because they're tacit.

>>: Because they're tacit. Because they're


they are the bubble that is around us. We don't
need to talk about them. So even though that's

what we're doing [inaudible].

>> Abram Hindle: But then, again, if someone starts talking about internationalization that's not
part of your group, but because they're buddies, that would be interesting too.

>>: It's going to be the same as a Web w
here like a link to a page is often much more descriptive
of that page, so anybody who references you would probably also use the word
internationalization, because, oh, jeez, the expert on internationalization [inaudible] and so if you
found links to you
in other e
mail lists, then you could figure out that you were the
internationalization person and assign that topic and then understand what the domain is inside
your messages.

>> Abram Hindle: Okay. So then I'm going to cover what can Microsoft do.

Basically spy.
Spying hard. So one of the problems I found was I couldn't estimate effort at all, at least with
open source stuff. There was no indication of time. You had lock. But that doesn't necessarily
say how long they spent.

So you could ta
ke the big brother approach and you could actually record


at least this might
work in companies where there's sort of a lesser expectation of privacy. But it doesn't always
happen. Like I understand there's serious issues with it.

But at the very l
east, if you allowed the commits to be tagged with time spent or effort spent,
things like that, and then maybe allowed the developers to modify them just in case, you know,
like maybe you left the computer on and


>>: What? [inaudible]

>> Abram Hi
ndle: So I've done not quite this but a little bit of it, and what I found was
interesting about monitoring windows being opened was that you really had to be aware of idle
and non

idle time. And different applications probably had different idle times.

So I had mine
set to 30 seconds which would make all my movie watching not count because movies are pretty

>>: Well, yeah, but you're talking about a particular [inaudible].

>> Abram Hindle: Yeah.

>>: [inaudible] like sometimes when I'm
actually developing, I'll go read an MSDN page and
I'm moving around that for 15 minutes


>> Abram Hindle: But it might be relevant.

>>: Yeah, and it's totally relevant, Visual Studio just thinks I'm totally idle.

>> Abram Hindle: Yeah. So th
at's also a possibility, like


>>: [inaudible]

>> Abram Hindle: If you had a Web browser aware.

>>: I have a hard time trusting developers' estimates.

>> Abram Hindle: So having some kind of concrete measurement might be useful, but there's
definite downsides to it and there's definite [inaudible]. It's still a


[multiple people speaking at once]

>> Abram Hindle: So this goes back into the commits and adding more metadata to commits.
So something like Visual Studio can provide more i
nformation, the structure or the time,
traceability, related artifacts. It could even go through all the Web pages you looked at, which in
some cases would not be great


>>: Filter out [inaudible].

>> Abram Hindle: Probably have to have a delete b
utton on some of those. It's like Facebook,
Facebook, Facebook, Facebook. Yep.

So there's possibilities adding more information such that you could do better tracking of certain
things, certain things you're interested. Especially structure. A lot o
f the Smalltalk VCSs, they
cover structure. So you can actually see when structure changes over time.

>>: You mean structure of the code itself?

>> Abram Hindle: Yeah, like the architecture. Yeah.

Another thing that could be improved would be


so I'm not really sure what you guys use for
project documentation. I assume it's like Word and maybe you commit that somewhere. I'm not
sure. I don't know.

But the point is that one other thing that could help would be the ability to have more tr
between all these documents. So when you're writing up something and it goes into a mail
message or it references someone else, if there was a way to get better traceability out of that,
and that could be enabled by, I don't know, better docume
nt repositories, better analysis of the
documents put into repositories, things like that.

Okay. So in conclusion we've got software process recovery which is the after
fact recovery
of software development processes from the artifacts left behind.

And this is exploitable by
Microsoft both internally and externally, and I've shown how Microsoft can also improve the
future at least in getting some interesting signals out such that you can better track your project
with some caveats.

Okay. Thanks

>> Chris: Thanks.