Doing researching and

beepedblacksmithUrban and Civil

Nov 29, 2013 (3 years and 8 months ago)

62 views

Doing researching and
experimentation

Cycles in writing and publication


Doing research


Shaping a research projects


Finding and reading literature


Research planning


Hypotheses


Evidence


Experiment


Writing


Cycles in writing and publication


Experiment


Designing experiments


Measurements and coding


Describing experiments


Statistics


Hypothesis tests

Beginnings


The origin of a research investigation is
typically a
moment of insight


Research
ideas often come to mind when the
brain is
idling


when separate topics
coincidentally arise at the
same time.


Tea
-
room
arguments are
a rich
source of seed
ideas.

Beginnings


the
first step


to
choose to explore ideas that seem
likely


to succeed


or
are
intriguing


or
have the potential to lead to something new,


Or contradict
received wisdom
.



At this stage, it isn't possible to know whether


the
work can lead to valuable
results


otherwise
there would be no scope
for research
.

Shaping a research project


How a potential research topic is shaped into
a concrete project depends
on context.


Experienced
scientists aiming to write a paper
on a subject of
mutual interest
tend to be
fairly focused:


they
quickly design a series of
experiment


or theoretical goals, investigate the relevant
literature, and set deadlines.

Finding research
Literature


Each research project builds on a body of prior
work


The
doing and
describing of
research requires a
thorough
knowledge
of the work of
Others.


Google obvious search terms


Visit
the web sites of research groups and researchers
working in the
area


Follow up references in research
papers


Browse the recent issues of the
journals
and
conferences in the
area

Finding research Literature


Search other journals and conferences that
might carry relevant papers


Search
the
publisher
-
specific
digital libraries
.


Check the program in a conference web site to
find relevant papers


Use the citation
indexes

Reading


What
is the main result?


How
precise are the claims?


How
could the outcomes be used?


What is the evidence
?


How
was the evidence gathered?


How
were measurements taken?


How
carefully are the algorithms and experiments
described?


Why
is the paper trustworthy?


Has
the right background literature been discussed?


What
would reproduction of the results involve?

Research planning


Project Milestones


Typical steps:


download some
code or implement
something, then experiment, then write up
.


Each stage
takes longer than anticipated, the time
for write
-
up is
compressed


It is a
mistake to
implement a complete system
rather than ask what code is
needed to
explore
the research questions.

Research planning


A
better approach:


explicitly consider what is needed at the end, then reason
backwards


Considering
as an
example research that is expected to
have a substantial experimental component,


the write
-
up is likely to involve


a
background
review


explanations of previous
and new
algorithms


descriptions
of
experiments


analysis
of
outcomes


completion
of each of these elements is a milestone.

Experimental Research


Continuing to reason
backwards


the
next step is to identify what
form experiments
will
take.


but prior to designing experiments the researcher must
consider
how they
are to be used.


What
will b
e
experiments show

assuming
the
hypothesis to
be true? How will the results be different
if the hypothesis is false?


the experiments are an evaluation of whether some
hypothesized phenomena is
actually observed.


Experiments
involve data, code, and some kind of
platform.


Non
-
Experiment Research


formal investigations of the properties of
systems and algorithms


a
wide range of studies that are difficult to
classify


proposals for new programming language features
and sketches of XML templates


for particular kinds of data to
reflections on
and
comparisons of trends
in research
.


Estimate the date for each milestone


A more effective strategy is to overlap these
stages as much as possible.


PhD
--

Patiently Hoping for Degree

Research Planning


A
typical question in the later stages of a PhD is
whether enough
research has yet been done,


or
whether new additional work needs
to be
undertaken

??


Often
the best response to this question is to
write the thesis.


Once your thesis is more or less complete, it is
relatively easy to assess
whether further
work is
justified.

Hypotheses


In the traditional sciences, a hypothesis
typically
concerns
some
phenomenon in
the
physical
world.


In
computer science, some hypotheses are of
this kind.


Other hypotheses
involve construction, such
as whether a proposed method is fit
for a
certain
purpose

and solvability
.

Hypotheses in computer science


For
example: a
research question
whether it is
possible to
make better use of CPU cache to
reduce
computational
costs


reducing
the number of memory accesses can
make a
program faster
even if the number of
instructions executed is unchanged.

Sorting algorithm


Hypothesis


improved
by replacing a tree
-
based
structure
with poor
locality


by an array
-
based
structure
with high locality
.


The
research goal is to test this hypothesis.


The
phenomenon that should be observed
:



as the number of items to be sorted is increased, the
tree
-
based method
should increasingly show a high
rate of cache misses compared to
the array
-
based
method.


The
data is
the number of cache misses
for several
sets
of

items
to be sorted.

P
-
lists
vs

Q
-
list


Suppose P
-
lists are a well
-
known
data structure
used for a range of applications, in particular as
an in
-
memory
search structure
that is fast and
compact.


A
scientist has developed a new data structure


called
the Q
-
list
.


Formal
analysis has shown the
two structures to
have
the same
asymptotic complexity in both
space and time,


but
the scientist
intuitively believes
the Q
-
list to
be superior in practice

Define a Hypothesis: P
-
lists
vs

Q
-
list


X

Q
-
lists
are
superior
to P
-
lists
.




As
an in
-
memory search structure for
large
data
sets


Q
-
lists are
faster and
more
compact than P
-
lists
.



We
assume there is a skew access
pattern


that is

that
the majority
of accesses
will be
to a small proportion of the data.

Define a Hypothesis: P
-
lists
vs

Q
-
list


A
hypothesis must be testable.
No Vague
claims


One
aspect
of testability
is that the scope be
limited to a domain that can feasibly be
explored
.


X Q
-
list performance is comparable to P
-
list
performance.


X Our proposed query language is relatively
easy to
learn
.

the renaming fallacy


Calling a network cache a "local storage
agent"
doesn't
change its
behavior,


it seems unlikely that a text indexing algorithm


is made "intelligent" by improvements to the
parsing.


Renaming existing research
to place it in
another field is bad science.

Defending hypotheses


test your hypothesis


and if it is
correct
-

or
, at
least


not falsified
-
assemble supporting evidence
.


raising objections and defending yourself
against them is a way of gathering the
material needed to convince the reader that
your argument is correct.



Defending hypotheses


"the
new string hashing
algorithm
is fast
because
it doesn't
use
multiplication
or
division“


Modulo isn’t always in hardware either


So there is also an array lookup?


What happens if the hash table size is not 2(8)

Evidence


F
our
kinds
of evidence
that can be used to support a
hypothesis:


A
nalysis
or
proof



a formal argument
that
the hypothesis is correct


Modelling


A model is a mathematical description of the hypothesis


Simulation



A
simulation is usually an implementation or partial
implementation of
a simplified
form of the hypothesis


Experiment


An
experiment is a full test of the hypothesis, based on an
implementation of
the proposal and on
real data
.

Evidence


Different forms of evidence can be used to
confirm one
another


When choosing whether to use a proof, model,
simulation, or
experiment as
evidence,
consider how convincing each is likely
to be
to
the reader.


Select a form of evidence, not so as to keep
your own effort to a minimum,
but to
be as
persuasive
possible
.

Good and bad science


Questions about the quality of evidence can be
used to evaluate other
people's research


research that consists of proposals
-
without a
serious attempt
at evaluation
-
-
can be more
difficult to respect
.


Some science is not simply weak, but can be
classed as pseudoscience
.


Pseudoscience is a broad label covering a range
of scientific sins, from
self
-
deception and
confusion to outright
fraud
.

pseudoscience

pseudoscience shares a range of
characteristics:


the
results and ideas don't
seem to
develop over
time, systems are never quite ready for
demonstration



the
work proceeds
in a vacuum and is
unaffected
by other advances,


protagonists argue rather
than seek evidence,
and the results are inconsistent with accepted
facts.


Often such work is strenuously promoted by one
individual or a small number

pseudoscience


An example of pseudoscience in commercial
computing
is



some of
the schemes
for high
-
performance video
compression
which promise delivery of


TV
-
quality data over 56
kilobaud

modems


millions of dollars were scammed
from investors


with tricks such as hiding a video player inside a
PC tower and hiding
a network
cable inside a
power cable.

Reflections on
research


It is true that, considered as a science,
computing is difficult to categorize.


The
underlying
theories
-
information theory
and
computability


most research in computer science is many
steps removed
from foundational theory and
more
closely
resembles engineering or


psychology.

A research checklist


Are
the ideas clear and consistent?


Is
the problem worthy of investigation?


Does
the project
have
appropriate scope?


What
are the specific research questions?


Is
there a hypothesis?


What
would disprove the hypothesis? Does it
have any improbable consequences?


Are
the premises sensible?

A research checklist


Has the work been critically questioned
?



Have you satisfied yourself that
it is
sound
science?



How are the outcomes to be evaluated
?



Why are the chosen methods
of evaluation
appropriate or reasonable?


Are
the roles of the participants clear?


What
are your responsibilities?


What activities will the others undertake?


What
are the likely weaknesses of your solution
?

Experimentation


In
computing, experiments
-
most commonly
an implementation
tried against
test data

are
used for
purposes such
as
confirming
hypotheses
about algorithms.


Experiments in computing take diverse forms,
from tests of algorithm
performance to
human factors analysis.

Designing experiments


Tests should be
fair rather
than constructed to
support the hypothesis
.


Choose the right baseline that your
contribution
is to be
compared to.


no
sensible researcher would advocate that
their sorting algorithm was a
breakthrough


on
the basis
that
it is faster than
bubblesort

Designing experiments


In the process of developing new algorithms,
researchers typically use
a data
set with which
they are familiar as a
testbed
.


If
parameter have
been derived by tuning, the
only way to establish their va1idity is to see
if
they
give good
behavior
on other
data


Cross
-
validation

in machine learning

Designing experiments


Care is particularly needed when checking the
outcome of negative or
failed experiments
.


A
reader of the statement "we have shown
that it is not possible
to make
further
improvement"


may
wonder whether what has actually been
shown is
that the author is not competent to
make further improvement

Designing experiments


For speed experiments based on a series of
runs


the published results
will be
either
minimum, average, median, or maximum
times


Results
may include some anomalies or
peculiarities. These should be
explained or
at
least
discussed.


Don't
discard anomalies unless you are
certain
they
are
irrelevant

Measurements and coding


Measurements can be quantitative, such as
number or duration or volume
--

the speed of
a system


They can also be qualitative, such as
occurrence or difference
-
whether an outcome
was achieved, or whether particular features
were observed.


Measurements can be mechanical or human.

Measurements and coding


In computer science research, the sole reason
for coding is to build tools and probes for
observing and measuring phenomena.


The basic rule is to keep things simple.


If efficiency is not being measured, for example,
don't waste time squeezing cycles from code.


Computer scientists get distracted from the
main task of producing research tools, and
instead


for example, develop complete systems.

Measurements and coding


A more reliable, repeatable approach is to run all
experiments from scripts.


Parameter settings are captured within the script;
the settings used last time can be commented
out.



Output from the script can be directed to a log
file and kept indefinitely.


If the output is well designed, it should include
information such as input file names, code
versions, parameter values, and date and time.

Describing experiments


Don 't just compile dry lists of figures or a
sequence of graphs.


Analyze the results and explain their
significance, select typical results and explain
why they are typical,


theorize about anomalies, show why the
results confirm or disprove the hypothesis,
and make the results interesting.

Describing experiments


experiment should be verifiable and
reproducible.


The description, of both hypothesis and


experiment, should be in sufficient detail to
allow some form of replication others.


reported results should be a fair reflection of
the experiment's outcomes.

Experiment Notebook


record versions and locations of software,


parameters used in a particular experiment,
data used as input (or the filenames of the
data)


logs of output (or the filenames of the logs),
interpretations of results


minutes of decisions and agreed actions

Variables


The ideal experiment examines the effect of
one variable on the behavior of an object being
studied.


In practice, elimination of variables is
remarkably difficult.


Even elementary properties can be surprisingly
hard to measure.


Use standard datasets when they are available


Benchmark problems in machine learning


Or corpora used test compression methods

Variables


Describe the test environment


hardware


performance in terms of the characteristics of
some commonly available hardware


For example : clock speed, disk access time,
and so on.


User as a Variable


Humans need to be involved to resolve many
kinds of research question:


whether the compressed image is satisfactory


whether the list of responses from the search
engine is useful


whether a programming language feature is of
value

User as a Variable


far too many human studies in computer science
are amateurish and invalid


Instructions to the experimental subjects should be
clear


the sample of human subjects should be
representative


the subjects should be unaware of which of the
competing methods under review was proposed by
the researcher


anonymity should be preserved



controls
-
analogous to placebos in medical trials
-
should be in place.


TREC


Text
REtrieval

Conference


participants
-
a large number of research groups from
around world


apply their retrieval systems to standard data and
queries


The output of the systems is then manually evaluated
by human assessors.


the use of standardized resources


Experiments are comparable between research groups


This commitment to robust experimentation leads to
signification impact on the community of Web search

Experiments in

Human
-
Computer Interaction

Statistics


seeking to answer elementary statistical
questions can illuminate experimental design


"algorithm NEW is typically faster than
algorithm OLD“


NEW is faster than OLD on average for the runs
undertaken in the experiments


population: the set of all possible runs


Representive

samples



Statistics


Find out how many strings might conceivably


hash to the same slot in practice


{ the hash table size, the number of input


strings, the strings themselves, and a hash


function chosen from the class }


Hash functions are determined by seed value





Statistics


Statistical tools that have wide application in
computer science research include correlation,
regression, and hypothesis testing.


Measures of correlation are used to determine
whether two variables depend on each other


Regression is used to identify the relationship
between two variables


Hypothesis tests are used to investigate whether
improvements are significant


Matlab
, R, …

Hypothesis tests


In the upper figure, the means are different


the samples are drawn from the same population, as the distributions have
high overlap.


In the lower figure, the means are different

and the distributions are well
-
separated;


these samples are drawn from different populations.

Intuition


Intuition is often unreliable in the context of
statistics


A long random sequence will have short
subsequences that appear non
-
random. If a


selected subsequence has pattern, it is easy to
jump to an incorrect conclusion


Observers tend to make unsupported
extrapolations from small numbers of events.

An experimentation checklist


What is to be measured? How is it be evaluated?


What code has to be obtained? What data has to be
gathered? What has to be implemented?


Should the experimental results correspond to
predictions made by a model?


What enduring properties might be observed by other
people attempting to validate the work with different
hardware, data, and implementation?


Have appropriate baselines been identified?


Do the results make sense? Are they consistent with
any obvious points of comparison?

An experimentation checklist


Is the code going to be made publicly available?
If not,
why not?


What variables might influence the results? How do
the experiments distinguish between the effects of the
variables?


Are statistical methods necessary for validation of the
results?


What is the population? How is a sample to be taken?


Are notebooks being kept? What is being recorded in
the notebooks?


Is ethics clearance required?