CASA User Testers Meeting

hundredcarriageSoftware and s/w Development

Nov 3, 2013 (3 years and 9 months ago)

57 views


1

CASA User Testers Meeting


Note: Dinner event at El Sombrero, Thursday evening


(General n
ote: see also the speakers’ slides for more specific details)


Editor’s P
reface


These notes are largely intended to mostly just reflect the general drift of the
meeting
discussion
s
, and
to
capture the Action Items (
red

text
) and major
decisions (
blue

text
). Like the Hitchhiker’s Guide to the Galaxy, it
undoubtedly

has

many omissions and contains much that is apocryphal, or at least wildly
inaccurate
, but will hope
fully at least act as reasonable
jogger of memories in the
future.


Day 1


Tuesday


Jeff Kern: Introduction


CASA Progress and Tracking. Jeff describes his new Jira
-
derived progress
tracking plots. See an example with the key for details. Plot is matplot
lib with
database access to Jira, and is solely for internal CASA use. We have got (very
roughly) about ¾ of the things that were June stable targets.


Speakers should link their talks to the agenda when possible.



Urvashi


CASA Imager Model


(See slides.
Presentation is

largely image/diagram
-
based).


An overview of the basic major/minor cycle distinction was presented, along
with a summary of the various clean methods. Fundamentally, they are just
iterative chi^2 minimization processes (“Data


Model = Residual”). The starting
model is usually zero for most sources.


Imaging options were then discussed.


Major Cycle


Gridding: Convolution resampling. The Gridding Convolution Function (GCF) was
then described.
Prolate spheroidal is used for simple imaging.
W
-
projection:
corrects for large field distortion effects

in which the 2D approximation is no
longer valid
. A
-
projection: corrects for antenna
-
related

characteristics.
Also, SD
gridder.
A much finer grid is us
ed

for preconvolution
, and then the value for the

2

actual target grid is evaluated

(precomputing and storing these is resource
intensive


it would probably be better to evaluate these on the fly;
parallelization is a consideration)
.


Baseline
-
based averagi
ng can be used. If this is overdone, though, then there is
smearing across the uv plane. It’s not too bad for simple imaging, but when using
additional convolution functions, it can sometimes be a problem. The longer
baselines track across the uv plane mor
e quickly, so they provide the primary
constraint.


Minor Cycle Algorithms


Point sources: Hogbom Clean, Clark Clean


Point/Extended Sources: Maximum
-
Entropy Method, Adaptive
-
Scale Pixel Clean,
Multi
-
Scale Clean


Wide
-
band

Images: Multi
-
frequency Clean (wi
th or without multi
-
scale)
-

reconstructs spectrum

as well as image. Matrix equation: multi
-
term algorithms
can be memory
-
intensive, even though we can use tricks like only storing the
upper triangle of the matrices, etc.


Imaging Options


Multiple Fields


Works with N smaller sized images (deconvolve N images separately).


Facets: e.g. separating the image into 4 quadrants. Major cycle done on each part
separately, but minor cycle processes the whole image. This is the older way of
getting around large f
ield
-
of
-
view effects (with using W
-
projection correction).


Mosaicing: Can be done separately, or as a “joint” mosaic (larger overall image
than each individual primary beam). Aperture
-
illumination functions with
phase
-
gradients as CDFs (this is essentiall
y a form of A
-
projection).


Then, we can extend all of the above into N data channels…


This is c
ube imaging, etc.

N data channels
are
binned into M given channels
.
Image channels are always in
the
LSRK reference frame
.
Conversion to “velocity”,
etc. is o
nly axis relabeling (not regridding)
.


Wideband case: Makes use of combined uv coverage from all channels together


one image.


Frequency axis. Multi
-
frequency synthesis (N terms > 1).

Combined uv coverage
and broad
band sensitivity.


Correlations axis:
Correlations and Stokes parameters. User can choose to make
images of various combinations of Stokes parameters



3

George noted that the choices of possible pairs, set of four, etc. that can be
generated is something of an artificial one. It largely arises a
s a computational
oddity, in that if a user wants to generate (say) a particular set of three, then
there would still be the computational overhead of generating the fourth anyway.


For a list of the required imaging options, see slide 13.


There was some
Rotation Measure (RM) Synthesis

discussion here.


Kumar
raised the point
that
overall efficiency
is increased by having
different
FTM
achines for different fields.


Continuum Imaging:


Parallelization does not occur in the minor cycle, only in the major cyc
le.


Parallelization: May want to ensure synchronization between spws, in which
case it might be better to have a single Iteration Controller, rather than one for
each spw. There are efficiency trade
-
offs. Both options are possible, and can be
plugged toge
ther at the Python level.


Tasks are set up so that it’s possible to
run a full “clean”, major cycle only, or
minor cycle only.


George noted that the flexibility is great, but it would perhaps be possible for a
user to write something convoluted/impossibl
e. Urvashi noted that controls for
such things can still be applied higher up, but it is important that it is not built
into the architecture.



Jim Jacobs
, Architect
:
VisibilityIterator2 and VisibilityBuffer2


(Slides available)


First slide is the
overall CASA architecture.

VI/VB fits in at the CASA Data Objects
(CDO) layer.

Also CalTables, the new MeasurementSet.

There is currently a lack
of separation between the Science Logic Layer (SLL), CDO and even the
underlying infrastructure layer.


George
noted that the sorting of data probably needs to be explicitly listed as one
of the items under the “abstraction to hide the lower
-
level layers)” items list.


Action Item on Jim: Provide a list of “blessed” sorts.


Action Item on George to define the CalSo
rt.


There is the default, standard sort that imager uses, the CalSort.



4

The VI acts as a pointer to a chunk of data, the VB is the way to access that chunk
of data.


The underlying data might actually be computed on the fly. George: Is it implicit
that co
mputed MSs are consistent with (say) the unaveraged one?


Darrell: The viewer has to access the averaged subtable, but would the flagging
have to go back and write back to the original table? This would be messy, and
technically probably not possible now.


Jeff: The viewer could generate selections for flagging, that could then be picked
up and used afterwards to go back and be applied. George was concerned about
system
-
intensive impact of interactive flagging if done this way. He notes that the
box specifi
cations would then still require that all the meta
-
information would
need to be

worked out for all the points: the box might be drawn around
averaged data in the viewer, but it would still be necessary to go back and get all
the actual underlying values, w
hich proliferates large numbers of
per
-
point
meta
-
information and hence
flagging commands.


Requirement that got missed: When doing these transformations, we need to be
be able take these selections, and then transform them back to the data.


The flag comm
ands need to be expressed in meta
-
information, but the boxes
users draw in the viewer are not expressed that way.


Channel
-
averaged and time
-
averaged cases are easy, but (e.g.) changed reference
frame implies splines usage, etc.


VI2 does little on its own
, but is effectively polymorphic, as
it depen
ds on what is
plugged into it. When you build an averaging VisibilityIterator,


Kumar asked how much of the code is effectively daed (i.e. never visited). The
compiler can identify dead code, but only generally
on a small scale. There is also
the issue of features being requested as soon as the unused code that already
does it is deleted…


MSTool is probably not using the VI. Msmd
also
does not.



(Coffee break)



Kana Sugimoto
:
SD and ASAP Package


(See slides)


ASAP: “ATNF Spectral line Analysis Package”
. Originally written in the early
2000s to handle single
-
dish, single
-
line datasets (which are not very large).



5

ASAP (SD) tools: Reduction and imaging. Less complex than for interferometric
datasets.
CASA simula
tion
tasks used are the same as for interferometry
(simobserve, simalma, simanalyze).

Spectral calibrations and spectral analysis
parts are different, but imaging and simulation are common.


The reference frame used is “as is”, rather than always LSRK. The
re is an option
to detect a spectral line each time the baseline is subtracted, however, so the
scenario in which a line is moving can be handled.


The slide shows “split by antenna” at the time of import. This is specifically for
ALMA, as it uses more tha
n one antenna to invoke simultaneous SD observations.
However, the internal data format in ASAP is designed to handle a single antenna
per dataset (via the scantable
s structure
), so splitting is necessary. They are
currently sequentially reduced, but paral
lel processing on a simple cluster would
probably be a good way to go forward in the future.

The data here are being
imported into scantables. Justo recalled that there was a version of partition
available that handles scantables, so with
the existing mult
iple scantable
format,
it may be possible to run through all the antennas in parallel. It might be good to
separate the data out on the disk on a per
-
antenna table basis.


Regarding
simobserve: the simanalyze part that generates the data model may
end up j
ust using the same code as SetJy.


SD tools are defined in ASAP, which has a number of dependencies.


These days, d
evelopers should build ASAP into the developer builds, after
casacore, code and gcwrap.


Darrell: Why do we have four cmakes to build?

Can th
ey be unified?


Scott: these should not be unified, as the time to traverse all the trees
increases
nonlinearly. However, there is a (rough) script already in existence to do all the
necessary cmakes. A null build of ASAP on (say) Kumar’s slow
-
ish, 4
-
core
machine takes about 8 minutes. It usually only takes about 4 or 5 mins.


There were no objections by those present to requiring that ASAP be built by
CASA developers.


CASA
-
SD Development Organization
: new features and performance
improvements

(see slides for details)
.


Currently awaiting ASDM format data for testing of Lissajous scan
-
type datasets.


SD tool interface layer is also in Python, which also slows down the processing.


(An aside: gcc 4.2.1 on OS X 10.6 generated code that would alwa
ys segfault.
After 4.1, we stopped shipping on 10.6. Builds are done using the debug option
whenever possible).



6

Tableiterator is used to iterate for I/O on the scantables. Justo thought that
introducing an iterator schema for the scantables would make a p
erformance
difference.


Regarding using matplotlib, even using agg (also used by Pipeline) as a backend
is slow. A faster, simpler plotter would generally be desirable.


Jeff noted that t
he VI/VB framework should be able to provide the spectra, so if
SD could
start to
use this, then any subsequent optimizations to it would also
benefit everyone.



(Jeff & Scott left at this point for an interview)



Kumar Golap: The Virtual Model Column


(
See slides for details)


George: For the mosaicking case, does the model have primary beam effects
embedded in it, and that would be different for each field? Kumar: Yes, each one
has its own FTMachine.


Urvashi: What about dependencies to other parts of t
he system? It will only be
running the FTMachine.


Sanjay: For things like W
-
projection, A
-
projection, etc. the overhead impact could
be considerable for large datasets.


Yes: t
here is a cut
-
off point at which this is then slower than reading from the
disk, particularly if the model is relatively large. (See
Kumar’s

“caveat” slide).
This also applies to cubes.


Urvashi: does this mean that we have to come up with the rules to ind
icate which
of the two approaches to use? Answer: Probably…


Sandra: Is there any plan to fix the table browser to support the table record
column? Kumar: thes can be different sizes, so it is not clear how this could be
done. Don’t know how the TB will di
splay such a structure. At the moment, a
warning is given. These are not keywords any more, they are cells with content.


There is an MSTransform
option to create a real from virtual data column. If you
do this and then apply a transform on the physical da
ta column, does this affect
the virtual one? Kumar: Right now, if the virtual
column
exists, the priority is
that
this
is what gets served. So
,

for this parameter, we should delete the virtual.


Justo: The only transformation that doesn’t work as described

is Hanning
smoothing? We could we add a Hanning smoothing parameter to the model, so

7

that it serves the data? In that case, all transformations could be represented in
the model.



(Lunch)



Jim Jacobs,
CASA Core Meeting


(As usual, see slides

)


Code evo
lution (slide 4): Use C++ 11 features where possible. This might just be
a matter of stripping out the code, and replacing it with a wrapper to the STL
equivalent.


Which gccs support all of the required C++11s for casacore? 4.7, 4.8? Shared
pointers are a
lready available with all gccs that we currently use. 4.8 is almost
complete. On RedHat, we are using 4.1 with some backported features. Clang
based on gcc 4.2 is also used.


An ifdef can be put in to allow people with old compilers to work with the old
im
plementation.


True thread
-
safety could be built if it was really needed. It might result in a lock
of lock/access contention.


Question: Statics: mutex
(
mutual exclusion object
)
vs. atomics? A lot of the statics
could perhaps be replaced by protected sing
letons, if we really had to.
Getting rid
of the statics in (e.g.) the table system would be difficult.


Table system: for Int64 columns, row numbers might need to be used. Jeff: Row
IDs: OK, but why make them a datatype? E.g. a column of 64
-
bit integers.

No one
present was aware of a use case.


The rumor is that Ger will be retiring in a few months. Question: Is it Ger who
pushes it into Google code? If he really retires and stops
doing casacore Google
merges/
check
-
in
s
, then
it will probably time to rethin
k the process
.


Ger had merged everything from our codebase up to April 22
nd
, and was then
going to


We didn’t want people checking things in to the casacore branch and affecting us,
but this has changed since then. If we have to, we can always take a svn

snapshot
and publicly release that. There are things in images that Ger and Walter are not
so keen on.


Kumar: there is also still stuff in there like latticecleaner that are no longer
maintained, but that are being used by people.



8

A lot of other places
use cassacore: they check out a snapshot, add their own stuff,
and then do not check it back in. We do have an ownership stake in the master
version in Google. They cannot, however, “torque” the version that we use.


The table system is built on top of ot
her parts of casacore. Darrell has had some
concerns that we are susceptible to changes from outside. Ger also still makes
some low
-
level changes.



George Moellenbrock


Cal Library

& Virtual Corrected Data


(see slides, yadda, yadda…)


Urvashi: The propo
sed setup is essentially similar to flagging.


Darrel: What are you going to use to parse the file? Right now, we jst have
something that parses the KEY=VALUE pairs, and turns them into a Python
dictionary.

The dictionary becomes a record when passed to C+
+, and there is a
requirement that all the required keywords are present. However, if there is a
small typo in the file, then a value may be mysteriously unused.


Sandra: We do something simil
a
r for the listmode in flagdata. We re
ad these and
create a dict
ionary.


Re: first
CalLib example: Darrell: what does this save you over a plain Python
file? One suggestion: you could have a Python object containing all of the
necessary parameters, and then just use the Python “compile” command.


One of the
requirements we got from the users is that these structures be user
-
readable.

The specification was received from Science. There was some
discussion as to whether or not users would be willing to edit the items
themselves. In other words, the essential dif
ference is just a matter of adding the
“=”, “{}” and naming it … There was concern that yet another format would
require additional maintenance effort (a new parser, etc.). On the other hand, it
could be argued that we would be building in language depende
ncy.


We could look into using something like pyparse…


Darrell: T
ools were supposed to have a state, tasks were not. T
his
c
ould stretch
that distinction



Jeff:
Two differences n
owadays, our users are getting more sophisticated, and
we
have a pipeline.


T
his looks like applycal should take a vector of per
-
caltable dictionaries.


Eventually, we end up with a laster list of caltable dictionaries.



9

This will allow you to do it with one applycal, which means you can do it with
just one pass through the data. T
his will help enable scratchless operation.


If you want to do OTF and look at 3c273 and 2c279 at the same time, then you
cannot currently do this. This would
also
let you do that.


In an MMS, if a concat has been done, then each one has a unique obsid.

Even for
a virtual concat. But pipeline does not use concat, so this will not be the case
there.


George worries that it may be possible (too easy) for users to not get the uniform
calibration that they think they have set up…


Steve

specified
:
[
TO, FROM
,

WITH
]

vs.

George
’s
recommended

ordering
:
[
FROM,
TO, WITH
]


Urvashi made the case that we
now already
have multiple instances if different
syntaxes
..
.


The broader
philosophical
question raised here is: Where are the distinctions
between the specification,

design and implementation?


Jeff: There is the specification of what it needs to do. The implementation needs
to be able to take that, go through the data, and process it accordingly. How it
gets done “under the hood” was done by George. The specification

of what it
looks like to the user is something that Steve has some right to input.

There had
been

some iteration
back and forth
between Steve and George on this.



Susan and Adam working on the viewer is probably the closest thing we have to
agile development. There is a
pretty
clear line between w
hat each party does.


Several developers
expressed

some concerns that this
sort of line
is
sometimes
rather unclear.


The

general consensus was that a reasonable iterative process with
compromises on both sides is the
ideal way to proceed.


Maybe we should be more proactive in getting
feedback back from the CSSC.


Returning to the CalLib topic:


George has doubts about the
support of intents (but Lindsey uses them
already)…


George (in response to Lindsey): If people want to work with dictionaries, then
that should be fine.


Virtual Corrected Data



10

The idea of an “established” list was discussed.



(Coffee)



Open discussion

about planning strategies


Some discussion took place about the user inputs and requirements handling
process.
Some developers fe
l
t

like they might not always
have enough input on
design.


A set of guidelines (e.g. tasks are stateless,
tasks are no longer

than 20

lines long,
etc.) agreed between the developers and the CSSC would be a good thing.

Different levels of help details might be good.


Can perhaps ultimately move some of the big tasks away from the core CASA and
into the (to be included) Pipeline.


When something arises that boils down almost to a matter of taste, do we do
what the developers or the CSSC prefer?

If there is one vot
e for something and
one against, then Jeff may have to make a call. If it’s (say) a comparatively minor,
predominantly u
ser
-
facing item, then he would probably go with what the CSSC
prefer. However, if there is a clear reason not to do something, then we will not
do it.


For example, the new handling of velocity in the viewer (imreframe) needs to be
easy and
uniformly
avail
able everywhere, in order to easily convince users that it
is not needed in clean.

In other words, once they know how something is
uniformly handled in CASA, then they get used to being able to rely on it.


George: do we have an overall design of th
e calib
ration and imaging method? A
“Theory of Operations” document.


Should we do a boot

camp of how the CASA infrastructure works?

Focusing less
on algorithms and new bells and whistles, but
instead

T
his is an image, this is
the reference frequency, etc.”

for
a couple of days.

In other words, place

focus on
the “Why”, rather than the “How”.


Homework for tonight: think about what should be part of the agenda for such a
boot camp?


(End of the day)


11

Day 2


Wednesday


Sandra Castro


Flagging


(Copy of slides on wiki page)


There is currently only one backup file, so it is currently not possible to go back
and just undo one bit of flagging.


Every flag command
(FLAG_CMD) file
line may contain the data selection
parameters (which might overlap), and then a loose union is evaluated.


We have to exhaustively evaluate the effect of all falg commands, because we
don’t know a priori which ones can s
afely be ignored. George was concerned
about having to work through a big list that comes from a plotms selection.
However, optimizations have been implemented, and testing has been done with
text files with thousands of flags. Justo: hundreds of manual fl
ag commands are
faster than the automatic option. George: but
if we say that flag commands can
be used for (e.g.)
plotms
, plotms

can generate
(hundreds of)
millions



There are manual flag commands and evaluation ones. The latter has to be run
on all
point
s, the latter not
.


Jeff: The question is: given a plotms/viewer representation, how many flags does
that expand into? If it is really tens of millions of lines, then performance could
indeed be a problem…


George: If a box has been drawn on an averaged p
lot, then this still has to be
translated back to the original, unaveraged data.

He is concerned about the
possible need to convert meta
-
commands into many real flagging commands.


So, if you make a flag command that has averaging differently in it, and th
en you
want to perform flagging on a different averaging basis, then you have to
evaluate that as well.


General consensus: draw a box in plotms, that gets evaluated once and for all
into flags, and that would avoid having to keep reevaluating the effects
of
drawing boxes.


Jeff: Open up MS
,

flags already exist in the flagdata column, and then all you want
to do are create additional flagging commands. When you draw the boxes, you
generate a list of the new flagging commands, make use of this ability to apply
this (fairly short) list on the f
ly, and then at the end of the session,
you actually
apply it and then write all the actual flags out once.


Urvashi: the use case she had in mind was simpler: write the actual flags when
the selections are made.



12

Action Item: Justo to implement the
averag
e
-
based flagging and back
-
propagation.


Decision: Viewer and plotms should instead of writing through to the flag
columns, should be writing through the flag agents, once we have a suitable
visbuffer structure

in place
.


Justo advocates keeping an
interactive flag backup, so that the user could revert
back to before the session. If the user actually wants to see a user
-
readable
explicit list of flags, then he sees no other solution than to evaluate a full list of
the flags (at which point, there is
little .


There a two issues:


1.
The first is having the viewer and plotter write through the visbuffer interface,
so the plotters have
direct
access.


2.
The second option to repeat user generated interactive flagging commands, we
don’t yet have a soluti
on. A flag backup could be used to revert a bit. IF someone
has filled the data, then the flag backup cannot be used.


Plotms reads the data, generates a plot, user draws boxes, then there may be a
need to run through the dataset in order to apply the flag
s.


Could have one
flag command recorded per box. Could transfer this to a clip
command for plotms.



George: But plotms can draw from multiple visbuffers...


Darrell: is it possible to have an “undo” option?


You have to supply your current state and eval
uate what your flag commands are.


Two models (for example):


Interface we provide is a set of selections and state. And we take that in the
visbuffer, and back translate it to what we apply to the data


Or, do we pass a flag cube, and then from that, we e
valuate that.


Action Item: Darrell, Uervashi, Justo, Sandra: define what this interface looks like.


Lindsey: Why actually so I/O, rather than set them up as a filter?


View of the MS: How about being able to flip between flag versions, and
compare
them?



13

The flagging history: was a bit of a stop
-
gap solution. The history was kept
outside the MS, as Ger was resistant to its inclusion.

There is already a ticket
from Juergen about this.


Action Item on

Jim: R
eview the design of the flag versions, with a view

to putting
forward a proposal to Ger.


Urvashi notes that a lot of the problems being addressed by Sandra’s new
proposal were raised a couple of years ago with Science…


George: users with clever flagging could have problems when standard
calibration chan
ges

(which it currently has been)
.


Sanjay: How much is flagcmd vs. flagdata being used?


Flagcmd is being used by Science to
apply the online flags in a batch mode to
analyze the online flagging

behavior
.


Pipeline does not use the table.


Some of the columns in the flag_cmd sub
-
table are perhaps not so
appropriate
(e.g. the “TYPE” is redundant).


Jeff: We will write a on
-
epage description of what is currently broken with it, and
make the case that it be deprecated and removed, or at least
make it read
-
only.


Action
Item
on Sandra: Write down an explanation of what’s wrong with it,

why
it’s inconsistent,

and what should be done

(at least make it read
-
only, or remove
it)
.



Jeff Kern


Weight and Weight Spectrum


(See slides)


Slide 1: there
is also a Sigma spectrum.


George raised the issue of the need to re
-
evaluate these again, as the weights
should correspond to the corrected data…


The right quantity to use should be weight spectrum, not weight.


Question: do we need a “corrected weight s
pectrum” column, analogous to the
“corrected data” column? Why do we need sigma spectrum and weight
spectrum? Isn’t there redundancy?


You need both, if you are supporting the concept of both data and corrected data
simultaneously. Sigma is a piece of data

we get from the telescope, weight is

14

usually 1/sigma^2, but not always. For SD data, users often want to customize
the weights.


Conclusion: we

need sigma, weight spectrum,

corrected weight spectrum




Is it worth keeping this unchannelized weight?


Georg
e proposed

to just always use weight spectrum as the real column
(former): real, imaginary, weight.

Would increase I/O by 30% but we would gain
in terms of code simplification.


Justo thinks there would also be an additional performance costs associated w
ith
going through it a lot of times in memory doing it Jeff’s way.


Consensus:
We should u
se weight spectrum and corrected weight spectrum
everywhere in the code. We use a compressed sto
r
age manager on the weight
spectrum.


George: could
we
perhaps express

the raw weights as a function that could then
be evaluated?

His might be one way to compress

in the future



100% of the ALMA cases will be passed through a weight spectrum. Eventually,
probably the VLA too.


The weights from ALMA are nominally from bandw
idth and time.


First thing: implement weight spectrum and corrected weight spectrum
throughout the code, then come back and look at compressing it to make it more
efficient.


Science scenarios: downweight atmospheric lines in ALMA spectra, “holes” in
Carilli’s data (currently getting flagged out, but corrected weighting would be
better). Also, channels go down at the edges: the channels are good, but should
be downweighted.



Sanjay


Moving selection into the VI/VB


(See slides, etc.)


This will enabl
e things like a selection in (say) velocity, etc.


Currently, the Reference MS (list of rows) could be persistent (i.e. written to
disk), but does not have to be.


In the proposal, there is no Reference MS, the selections are done at the visbuffer
point.



15

George argues that there might still be use cases in which one

might still want
the reference (not always, of course, but might be good to keep the option).


Kana: A
ny possible performance hit? Jeff: It probably depends on what your
selection looks like. I
n the case where you are skipping tiles, the reference MS
route might be faster. If reading all tiles, but not processing all, then it will be the
same.


A processor core is not being used for the asynchronous I/O, if seletions and
tackle expressions are b
eing done, then this might take up a couple of cores, so
there would still be something of a penalty.


The in
-
row selection is already going into the VI/VB anyway



Need knowledge of the whole scope of what
rows have
been selected.


What is being proposed
is essentially a re
-
indexing on the fly. A selection has
been done (e.g. rows 0, 2, 4), then they are then relabeled in the slection as (0, 1,
2)


we have not done a good job of keeping track of which are internal keys and
which are external keys. We are
currently using row numbers as keys…


The MS selection is still working as it is currently, but the VisIter
(or perhaps the
MSIter)
is then responsible for creating the
reference

MS

(based on the MS and
the selection information)
.


In principle, the input
MS could already have been pre
-
selected before it goes
into the MSIter, but the MSIter has no knowledge of that.



(Coffee)



Scott Rankin


Changes to Build and Distribution System


(Slides, etc.)


Some
brief
discussion of which platforms are supported
,
and which compilers
are used
.


Proposal: Testable packages would come straight off the trunk


eliminate the
casapy
-
test branch.

The trunk should always run.
Developers wanting to test
issues can use a range of alternatives. Ideally, trunk should always bu
ild


Decision: casapy
-
test branch will be eliminated.


Stable package: record the svn revision(s), and if we discover a problem that has
to go in, branch from a stable build

if we need to.



16

Concern about race condition: e.g. close to ready stable with one

thing to fix. That
gets fixed, but someone breaks something else in the meantime.

Soluti
o
n: always
go back to near
-
perfect version, apply the fixes to that.


Decision:
stables to be built on the proposed revised basis

(see slides for details)
.


Pipeline d
o not want to deal directly with the full CASA C++ build system. This
can probably be done. We do not yet have a Pipeline release process in place yet.
We could perhaps include it in the CASA package, separately, or something else?



(Lunch)



CASA Third
Party Packages


Kana: When will the 3
rd
-
party packages for OS X 10.8 be available? To be
discussed

with Shinnosuke Kawakami next week
.


If devs need a new third
-
party package, it should be run past Jeff and Jim first,
and probably not within the last
couple of months before a release.


It will probably be necessary to continue to package Python. In terms of third
-
party packages, the aim would be to use as much of the already
-
present system.


Wes has seen things break when building under MacPorts

(for e
xample, building
the
DBus

stuff)
.
Wes argued that dependencies almost have to be tracked
manually.
We could maintain our own overlay
ports file
(probably based on
Darrell’s existing file)
using
our chosen versions, and run our own repository.

MacPorts shou
ld improve consistency.


“Don’t make Scott think!”




The MacPorts effort will be approached cautiously.



Testing


Main layers:


Regression
: Groups. ~8 hours to run for the whole suite on a powerful machine.
The ones run every night take something like 4


6 hours, but this is not the full
suite.





RunUnitTest performs vital functional testing of the Python layer
.

The full suite
is run for package assessment tests, and takes ~2 hours on an 8
-
core machine

17

(with no specific attention paid to parallelizatio
n, other than what is already in
there). These are already broken down by module.





CasaCore;

}

/test/… t_*.cc




Atomic tests. Some of these
Code


}





depend upon the Data Rep.



Some 50
-
60
-
ish CasaCore tests will afil without access to the Data
repository.

CasaCore will diff output with existing files.

For CasaCore, the set of tests typically take ~2


3 minutes to run.

Code: the set of tests take ~ 0.5 hours ?


In order to only run a subset of tests necessitated by a particular code change,


Je
nkins polling rate is 2 mins.


Scott
, Wes
: Ideally, everyone should know the status of the system at all times.
If
someone commits to casacore and breaks gcwrap, then it is advantageous for the
system to mail everyone to increase the likelihood that someon
e in gcwrap can
help
out the person who broke stuff.


Darrell
, Kumar
: but that means that everyone has to watch all incoming Jenkins
e
-
mails
: people might start ignoring e
-
mails…


Majority decision

(albeit with dissenters)
: the per
-
commit email notificatio
ns
will
continue to only go to the committer

(as already happens).


Regression failures: everyone gets an e
-
mail.


One possible enhancement: we could diff between the latest regression report
and the last one, which would still allow us to narrow down who
broke stuff.


Check
-
in runs a small set of the RunUnitTest suite, but this will change.


Scott envisions steady escalation from tests that run quickly to ones that take
more and more time, for all tests, i.e. functionality tests, benchmarking tests, etc.

A

new function is not
regarded as
completed until tests for it have also been
written.


General principle: developers should not be relying on the regression tests to
catch problems.


Jeff: Most of the tool functionality should be tested at the /test/… leve
l (but some
of the tool stuff is in gcwrap

-

Dave
), and the user interface stuff should be tested
at the RunUnitTest
functional test
level.


There are 200 tests within flagdata. Sandra runs these overnight.


18


If the test is written on gcwrap, and you

commit to code, then you would
n

t see
the problem until you test
the interconnection with gcwrap.


A lot of gcwrap is just a wrapper around the C++ layer, so it is quite easy to write
Python tests for this.


Decision: Testing of tools happens at the gcwra
p level Python layer.


Some cmpts have a thick Python layer, and would probably need atomic tests for
both the C++ and Python parts (or be fixed so that more is done in C++). For
components that have thin Python layers, then because the Python layers are
b
asically just a wrapper, the tests more or less narrow the problem down to the
C++ level straight away.


Jeff:
The Agent flagger class uses the TfCrop agent. TfCrop relies on the VI/VB…
You can run tests on the TfCrop agent…


Ideally, whenever you can type

“make synthesis”, you should also be able to type
“make synthesis test”, but this is expensive in terms of effort.


Requirement: All new code shall have a test committed with it that exercises it.
The preference is that you test as low as you can, so that

if you have just put in
C++ code, you should put in a C++ test where possible.


If you have a small image, MS, etc. (even a couple of KB), it can b
e included at the
atomic layer.


Jeff: Before you check something in, what do developers consider their
obl
igations in terms of running tests? Urvashi, Sandra: Functional tests and
affected modules. Kumar: If there is a “make test” option for a very small thing,
run that and then rely on Jenkins. Lindsey will try to test as much as possible
before checking anyt
hing in.


Baseline should probably be at
the very
least
to
run the (equivalent of) “make
test”
. For some things, this is simply not an option, however.


There s
hould always be a link included to point to the appropriate test.


Why the split between gcwrap
and code? Gcwrap was supposed to be primarily
to access Python. We have separation between implementation and the interface
layer. However, testing proposals are blurring that.


One extreme: do atomic test (Kumar), other extreme: run full regression
(Linds
ey).


Baseline: Check something in, run the relevant module test, the functional test,
and a small subset of the regression tests.



19

In every module, every directory, there should be a test that will do something
reasonable, with a required dependency, and
this should be a minimum
requirement before checking code in.


From a practical point of view…


After gcwrap has been compiled, where does the functional test thing live?
gcwrap/scripts. In the code directory to tell people where to look: we should
have a
simple script that goes and finds the
test
script in the gcwrap/script
directory and runs it.

And these should always be run before check
-
in.


Action Item: New tester is to put the full suite of functional tests into the nightly
test
run.



(Coffee)



Sandra Castro

RunUnitTest


(see slides, as usual)


Documentation is available at:


http://www.eso.org/~scastro/ALMA/CASAUnitTests.htm


Decision:
A
ny new tests should indeed have an entry a
dded to the
active/gcwrap/python/scripts/tests/unittests_list.txt file


Kumar



Test
F
ramework


(No slides)


Currently at least 15
-
20 tests using this. Some are wrappers to regression testing
scripts. The Jenkins testing is run via this mechanism. The comparative plots are
not
(currently) published
, though.


These are comparative tests rather than absolute number
s.


The results are sometimes does on statistical quantities like RMS, and so it is
possible that these could be tripped up by a bug introduced into imstat.

They
could probably be done using numpy…


Next… GUIs!


Darrell
& Susan


Recent developments in the

Viewer


Points
raised
during the presentation:



20



Suggestion:

Could display what the axes are

in the summary window, in
addition to the numbers of pixels



Suggestion: could implement the option to s
ave a slice
of an image as a
subimage



Skycatalogue format fi
les are quite nice for overlaying



LEL expression is quite popular with users



Can
automatically im
regrid
the
spectral axis between two image
s

to
match up the frequency planes. Uses the first image loaded as a template.



Cursor mirroring between two open,
corresponding windows with
different, co
-
positional images. (Audience reaction: “Oooh!”)



Can set line (and width) and generate PV diagrams. Moving the cursor
over the PV diagram plot will have the cursor shadowed on the originally
specified line.



It is pos
sible to plot (say) four successive channels tiled in the same
window.

Suggestion: specify this from a script. It would be useful for the
pipeline.



Suggestion: Might be good to be able to specify a “stride”?



Limits can be imposed on the start and end of th
e video playback

(i.e. just
play as a movie between (say) frames 5 to 23)
.



“Jump” option jumps between the minimum and the maximum pixel
frames.



New polylines generation from image slices

also does “ghost” cursor
tracking



Histogram of intensities can now b
e plotted. The histogram tool will do a
Gaussian or Poissonian approximation

fit

too.



Spectral Profile tool. A number of options, velocity frames supported, line
selection plotting.



Image Color mapping: Can (e.g.) fill histograms, tweak the color, invert
the color map, etc. A range can be placed on the histogram to restrict the
colors.



Also, can generate a list of sources found in the field as a text file.



Work in progress:
Longer
-
term idea: be able to have multiple clean
sessions running, and use the view
er as a client that could interact with
and switch between the various processes.



Also, build a mask that can be saved, and then applied.



Rest Frame


Kumar


(No slides)


We image in LSRK.


LSRK


the systemic velocity does not change over (say) human li
fetimes.


We have “rest frame defining measures”. There is now the requirement for
tracking planetary and atmospheric objects, which move over the course

of the
observation. We have the ephemeris of the objects, so we need to track these

21

sources and image
them in the frame of the source. Or we could convert the data
to a “fixed” frame (e.g. heliocentric) and image to that. The disadvantage of this
latter approach is that only one time would be associated with the observation,

which is not strictly correct.


Decision:

the frame
described above (i.e. comoving with the target field source)
sh
ould be called “Source” instead, to avoid confusion.


An amplitude correc
tion also needs to be made for Solar S
ystem objects. Should
this be done in Se
tJy? Should it be
done at the same time as the reference frame
correction?


Action Item on Jeff: Work with Bryan Butler on the question of amplitude
corrections associated with reference frame changes.


Does this also address the Nobeyama rest frame question as well? This c
ould be
done.


(End of the day)



22

Day 3


Thursday


Patrick Brandt


PlotMS


(Re: slides


Y
ou know the drill by now…)


It would be possible to plot SD data in plotMS.


There was some discussion about how to handle working with ScanTables vs.
Measurement Sets. It might be expedient to write an interface to handle
ScanTables. It would work with the existing architecture, but Kumar expressed
concerns that such an approach would still be a hack. He felt that if SD data were
to be viewed in PlotMS,

it should be done properly (i.e. by working with MSs
instead). This problem has (to some extent) already happened with the PlotMS
architecture.


Kana: PlotMS currently does a lot of averaging, etc. These sorts of things would
be done in the “Compute” box
of the

plotMS DAL


slide, which would handle
algebra, etc.


The scripting interface is a second type of client. Pipeline is interested in two
forms of operation: scripting with no GUI at all, and scripting with some plotting,
etc.


The “engine” layer in t
he plotMS engine slide is what handles generating the
image. The image produced could perhaps be a bitmap for the Pipeline offline
plotting. Vector plotting might be another option. In the GUI case, there is a back
& forth between the client and the Render
erFactory. This means that (say)
zooming, etc. could be done in the client component, and then just the
information needed to repaint the image. Darrell drew a similarity to VNC
, which
makes use of bitmaps at the client side
: the thinnest possible thing is

a bitmap.


Under the proposed architecture, the client does not have to own the state (but it
could). The mapping between world coordinates and pixel coordinates always
n
eeds to be known by the client.


If we got to a pure bitmap model, then the design of

driving the script with a GUI
is an inherently problematic design. Essentially, once the user has decided to set
up plotting by a script, then we will not let users subsequently change things by
clicking around in a GUI.

Or the user can start up with the
GUI and click around.


Darrell: One option for the GUI client might be to have it effectively record a
macro to a file
, which then be used by the user as a basis for creating subsequent
plotting scripts.


Kana: Will the new plotMS still permit data flaggin
g? If so, then this
would need
to be connected back to the data (wherever it is). The understanding is tha
everything the user can plot in plotms, there should be mapping back to the
original dataset. This is sometimes tricky, like plotting Stokes paramete
rs. What

23

will probably happen here is that some of the DALs (Data Access Layers) will
have to be able to pass information back to the data.


The DAL work is fairly well advanced. The restructuring as per the plotMS
Engine might hopefully be provisionally i
mplemented by the end of the summer?


The new plotMS will probably be somewhat provisional for 4.2, and included in
parallel with the existing one.


Urvashi: With suitable architecturing (is that a word?), the visbuffer
developments would also help with
the graphical interactive flagging.


George felt that fundamentally, the existing parameter set is conceptually largely
there (although the current way they are implemented is not so great).


Jeff regards plotMS in essence as “a table plotter with a smart
interface

that
knows about the Measurement Set
”…


Some back
-
and
-
forth about how much relative effort should be spent on
reworking the GUI. General strategy will be to see how long it takes us to
implement the “under the hood” structure first.



Jim


More
on Plotting

Architecture


(See slides
; Jim had to step out, so Jeff took over
)


Ad hoc plots: for the developers (Kumar).
Kumar recalled that i
n AIPS++, there
was a wrapp
er class for PGPLOT. Darrell had

already written something just like
this for CASA, Urvashi has used it for flag display.


One of the requirements was to be able to do GUI plots
without dependence on a
library, so that a developer does not have to require the use of the viewer
package. I
t could be used as a developer tool, but could conceivably also be used
by users.


Jeff was wary: if a developer has a plot that users do not, users will see them and
want them… If we have a lot of plots scattered throughout the code, then it might
be temp
ting to tweak those, rather than focusing on working on the other parts
of the code.
Rather than scattering plotting through the code, he would prefer
that data to be plotted are generated as output somewhere, then picked up by a
software “bus”
(
DBus
!)
and

plotted using
the viewer
.


Kumar wants a common way to set ploys, perhaps with a few buttons. Plus, a
parameter to suppress it when running in a headless environment.

This is what
Darrell has already written.
Qt Designer XML extensions can be used to add
buttons to the plotter.
The plotting primitives that the current version supports
are a little limited, so it could perhaps be expanded to be a more complete
plotting library.

It might be possible to connect everything up to the viewer

24

plotting code, which

would then lead to the same look and feel. This would take
some work, of course…

In principle, with the right design, it could accept the
other interfaces.


Decision: We will use Darrell’s implementation of the plotter. We need to
investigate a list of im
provements to make it better, both in terms of features and
architecture.


Jeff: so we have some
code that will publish to the
DBus

thread
. We could have a
Qt client
that

then interacts with that. We could then also have anot
her client
that interfaces to
DBus

the same way, but might be pgp
lot, or just dump it to a
file instead, etc. Another option: send the signal, and if there is a client listening,
then it plots it, but if not, then it gets ignored.


Kana: Is there any documentation for Darrell’s plottin
g code? Urvashi will give a
quick demonstration.


Darrell: Do we want the bus to be persistent
? Could be configurable. This will be
discussed later.


Action Item on the GUI group (Patrick, Susan, Darrell, Jeff and Jim): Follow up on
what the architecture
actually looks like, how viewer, plotms and these ad hoc
plots all fit together.



Big Data


(See Jim’s slides again
; Jim back at this point
)


Darrell pointed out that users still need to be able to run on the local host. Jeff:
agreed. We should design the

system such that it can be
separated

across the
network, even though we will not
actually go ahead and separate them

yet.



GUI and Plotting Package


More complicated things such as curvilinear coordinates, multiple axis labels, etc.
might argue in favor

of using an existing plotting package.

For example,
WCSLib
with PGPlot has traditionally been quite attractive for some of this sort of work.


Qt/Qwt is one pair of GUI and plotting package. Java/Swing/FX
/FreeChart
.


A decision does not have to be made
this week, but does need to be made soon.


FreeChart has update issues.


TkAgg is another possibility. It does natively support TrueType fonts.
It would
enable true GUI
-
less operation.
It would need us to implement axis labeling, etc.,
so like Java, it wou
ld not be something we would lightly consider adopting.



25


TkAgg could perhaps be used to handle Qt interfacing. We would habve to handle
mouse clicks, etc., though.


Susan had surveyed the options in the C++ world, and Qt/Qwt currently seems to
be the
le
as
t
-
worst option. Only one pers
on is maintaining Qwt, and the PyQt stuff
has
pretty much
been abandoned.


Urvashi then de
monstrated some of the features,
syntax
and conventions
of
Darrell’s plotting code described earlier. It is pure Qwt.

Note that a
main fi
le and
a
proxy wrapper file
need

to be created.


Jeff thought that this solved the ad hoc plotting question posed by Kumar, but we
should be aware of a steady stream of small, incremental additional
requirements until the thing becomes unwieldy.



(Coffee)



General open
discussion

abo
ut “publication
-
quality” plots


Action Item on Jeff: Establish what is actually needed in order to achieve
“publication
-
quality plots” (whatever that is).


It seems that Qwt is currently the most ful
ly
-
featured of the C++ plot
ters, but
there is not one package that does everything.


The three options are: Java/FX/FreeChart, Qt/Qwt and TkAgg. The second option
raises problems for headless operation, the third option would require us to
build a lot of things on top of it, the fir
st option would also be a lot of work and
some of the anticipated features are only anticipated at this point.



Going back to the
REST
frame handing



In clean, the user can specify selections on both input and output. There are
currently many ways a user can inadvertently “hang themselves”…
Sometimes
channels apparently disappear, etc. due to accidentally mixing frames.


Clean will perform the clean eit
her in LSRK of the SOURCE frame. When
specifying the output frame, we need to make sure that what the user input is
internally translated into what we actually work in, factoring (e.g.) different
times covered in order to avoid lines getting smeared, etc.


The proposal is to reduce the number of user
-
facing options, and then reject
some data that are not likely to be used. For clean, there is a toolkit function that
can advise the users about channels that should be used.
Darrell: How about an
option to do
a dry run first and then get user approval?


26


Jeff: do we have a prohibition against tasks changing according to their own
parameters. For example, having an “advise” mode, rather like the “inp” option in
AIPS.


(Note: all the tasks have a GUI: “clean.gui”!
)


Darrell: W
e could have a clean.
inp
-
type structure
..
.


If we had a class that implements clean, and it has (say) two methods: go and inp,
and that inherits from the task class inp( ):Pass ?


The user could be offered an object that the user could choose

to accept or not.
Essentially, this would be a helper function. (There are possibly other places
where this sort of thing might also be used). Question: where does this go in
to
the tool stack. Could have an

“advise”
hook
class, that could then be used by
other tasks, as wanted. Maybe also useful for (say) imregrid
, imfit, s
ensitivity
calculator
?



Jim


Development
Tools


(See slides)


Can developers still go to a software conference once per year?
In principle, y
es,
if useful

and appropriate
, subject to t
he usual budgetary constraints.


Could have period lunch talks on
new languages, techniques, etc.

How things

work

(like GBT scheduler)
.


The new DAD should be in charge of arranging (say) monthly meetings along
these lines.


Some discussion
followed
about code formatting. Tabs vs. four spaces, etc. in
Python. There is a Python style guide, there are several for C++. A repository
style is important.


Decision:
No tabs for indentation!

Blocks of f
our spaces

should be used
for
indentation
.



(Lunch)



S
andra and Justo


Parallelization and HPC


(See slides, tum
-
ti
-
tum…)


MMS = Multi
-
MS
.


27


Takeshi: why do we need them

(MMSs)
?


Jeff:
They are d
esigned to do two things:


1.

Hide from the user that we are doing this (farming out sub
-
MSs to a large
number of
nodes).

2.

Not all of our tasks are (yet) happy with taking multiple inputs. This is a
way to do this.


This structure also gets around write locks.


Michel is working on a filler that will also be able to run in parallel.


“numsubms” is included to allow
more flexibility in dividing up into subms
matrices that will work.


The reference sub
-
tables all have to have the same structure. For example, the
spw table in a sub
-
MS might still contain referen
c
e to three spws, even though
the sub
-
MS itself might only
contain data from one spw.


Imaging is driven by data selection, i.
e. the top MS, not the sub
-
MSs.


Proposal:
Instead of linking to the first one,
and replace them with
a copy, and
then just flag
stuff
out
of the sub
-
MSs as
needed.

Get rid of the links at
the top (of
the Multi
-
MS slide), and then in the sub
-
MSs, flag the stuff we do not need.

This
would not be a large addition in terms of data volume.


Consensus:
the above is
a nice idea, but not a high
-
priority item right now.


Decision:
Look into
the
suggestion that i
f you open up the top
-
level MS with a
write
-
lock, then a write
-
lock is imposed on all sub
-
levels as well. This is a higher
-
priority item.


The calibration task is (so far) not parallel. From the outside, the multi
-
MS looks
like a single MS

for that task.

We will go back and look at parallelization in the
future.


Kumar: Regarding
the
mstransform

framework
, we want to do cvel before
channel averaging (to avoid smearing).


Regarding the quoted benchmarks, vs/ cvel/split: the numbers quoted ar
e not
taking advantage of parallelization either.


George wondered where in the scientific ordering calibration
would fit
, and was
worried a bit about scientifically incorrect ordering
. Mstransform has a fixed
ordering, but that is not the case with the VI
/VB framenwork.



28

Regarding the mstransform ordering: Dirk wanted to do time averaging after
freq
uency

transformations, but it was hard, so we’re currently doing it the other
way around; for small amounts of time averaging, it might not matter, but for
larg
e amounts, you might have t
o run through mstransform

twice

in order to get
the desired ordering
. This is because the time average iterator is not yet capable
of handling the frequency transformation iterator as an input yet

(but the aim is
that this will be possible
eventually
)
.



Kumar


Status
of P
arallel Clean


(No slides available on this)


For cube imaging: linear speedup; for continuum: we do gain ~80%

(modulo a
few technical caveats)
. As a proof of concept, it seems t
o work well, so it will be
implemented in the new framework.


Two levels

of parallelization implemented
:


Parallel clean


Also the op
e
nmp thread, which can be set to parallelize
up to 4 ways in a
threaded manner.



Jeff


Requirement
s

for Interprocess Comm
unication




Asynchronous notifications


which are not really being handled
anywhere. Parallel clean requires asynchronous messaging.



Service discovery



Interface Introspection



Interface Polymorphism (desirable)



Message passing



Object Orientation



Map/Reduce



Multiple Hosts



Multiple OSes



Authentication (desirable
; including open vs. private networks
)



Languages: C++/Python



Error handling (including transfer of error codes)



General underlying principle: low bandwidth



No need for proxies (i.e. for NAT/firewall tra
versal)



Binding generation (multi
-
language), simple description (desirable)



HPC/supercomputing facility support

(desirable)



Common; Large user base
; well
-
documented; well
-
supported



Small memory footprint



Free!




29


(Coffee)



Justo


Current Parallel
Framework


(More slides available…)


Jeff: Difference between the casapy client and the casapy
ip
engine. Differences in
what is started by default, and differences in the some of the wrappers (the ones
on the
ipengine
s are lighter).


There are no
Lustre
-
specific assumptions


in principel, it should work with any
shared file system.

All you need is a single
file system

for multiple nodes to
access. As long as the furl files are shared, it should work.

We do not support
each node having its own indi
vidual file system, although this could
still
be
scripted by a sufficiently
knowledgeable

user.


ZeroMQ supports complex messaging features, but we only use a relatively
simple subset.


The remote
ip
engines are running, and control is back with the user. I
f a given
job is run asynchronously, this allows the user to regain control of the client
without having to wait for all of the
ip
engines to have finished.


Jeff thought that o
n a simple cluster,
it was the case that
if the user calls checkJob
with block=T
rue, then the user has to wait

until they are all finished

(when used
from the parallel task helper)
.


Generally, checking on a per
-
job basis gives the user more control, so that if an
ip
engine crashes, then a Traceback is obtained and the user can track
back which
is wrong.


Question:
If a user exits Python, and re
-
launches it, does it hook back up to the
controller?

Answer:
No. Sometimes, if casapy fails, but
(
fairly
)

gracefully, then it
will run destructors to shut down the controller. If it crashes bad
ly, it may be left
as a zombie.


Suppose (say) four
ip
engines have the same keys on all of them, then they are
aggregated.


Jim: Are there remaining shortcomings? Justo: It works, and has been tested. Jeff:
There are a few remaining things: It forces all
parallelism at the Python level,
none at the C++ level. There are no asynchronous notifications. The Map/reduce
mapping of jobs to
ip
engines is currently not built in. Wes: we have effectively
been frozen in to a particular version of iPython.

Sanjay: Is H
adoop an option for
us? Jeff: The map/reduce framework, maybe. It is not currently the preferred
technique for the clusters/supercomputers, but it can be discussed.



30



Darrell


DBus


(Slides

available
)


On OS X the way we use DBus has been an annoyance fo
r users. How should we
be using it? Whenever it is started by launchd, and it starts a new dbsus daemon
for the user. There is no way to find out the bus “URL”. It should be done the way
it is in Linux: start it up with casapy. On OS X we check to see if i
t is there, and
even if it is owned by another user, it tries to use it.



Darrell


Pyro


(A few
slides

available
)



Jim


MPI


(No slides

used
)


A low
-
level message passing scheme. Easiest thing to do
with it
is pass around
arrays.
A set of processes are

established through the mpirun, it comes up, and
then you can tell what the process ID is. Well accepted in the scientific and
supercomputing world.

A number of MPI wrappers already exist in CASA. It was
used 10 or 11 years ago, but it got neglected due t
o lack of available effort.

It was
done to handle serialization of images.



Wes


SAMP


(Slides

available
)


SAMP =
Simple Application Message Protocol


Each host has a hub.
TOPCAT has a hub built in (for example). L
oad a local image
into ALADIN. It is a C

library that is fairly easy to use, and u
ses an XML RPC

underneath
. In the example,
TOPCAT cats as a hub, loads an image, it can’t
handle it itself, so it sends out a message to all clients, and any clients that can
handle loading the image (such as ALADI
N) do so.


Supports Python, Java, JavaScript, Perl


Used by a number of pieces of astronomical software.


One would code an mtype (message type) and then a message.



31

There is an implementation that we could grab and use that is supported by
NOAO.

The C library is in maintenance.


See also:
http://www.ivoa.net/documents/SAMP/


Wes had tried to incorporate a SAMP layer in the viewer together with DBus
(two separate ports), but could not get SAMP to

load the images.



Kumar


OpenMP


(See
available
slides)


Not
really
like the above messaging systems. It is
essentially
multithreading.


Decision: When using an OpenMP call, we should always ensure that
default(none) is used

(to avoid different threads

accessing the same memory at
the same time
.


Private: No initialization
.

FirstPrivate: everyone gets the initial value defined earlier
.


The preference here is that whenever we write something threaded, we need to
be look
ing at how many threads we have. The programmer should not
assume
they
have the whole machine to
themselves
. Don’t use the environment variables,
please.



Jeff
-

Discussion of Which Way to Go


Jeff: The S
i
m
pleGo
approach used
was OK to start with, but ssh

key handling, etc.
probably won’t work with supercomputing facilities. It would need a fair bit of
reworking.


At the user interface level, DBUS has mostly worked OK so far. It can can now
span hosts and nodes, so it might work great, but such usage is ne
w.


SAMP: It is coming and we will have to deal with it a bit (VO, etc.), but does not
appear to be designed for tightly
-
coupled systems.


Strawman: At the level of
ip
engine parallelization, from both Python and C++,
MPI looks like the way to go. In the
user domain, DBus has been working OK, and
we stay with that (with a bit of reworking) for client
-
client level. Use SAMP as
our interface to incoming VO stuff. We could still build a CASA
-
VO bridge.


Scott:
What about ZeroMQ?


Patrick: ZeroM Q is g
ood

for
many of the listed requirements
.

It l
acks
introspection and service discovery.

Has bindings for Python, C++, ruby, etc.
If

32

you try to talk to a basic socket, you just fail, ZeroMQ can handle it much more
elegantly: queuing up of messages, etc. Most like a
replacement for MPI
, which is
more or less the supercomputing standard at present
. Has recently forked.
ZeroMQ protocol is not self
-
describing, object serialization would need to be put
in. This is also true of MPI.


Action Item on Jim: Not just to look at

ZeroMQ, but talk to Andrew Grimshaw and

folk at
Green Bank

to
get
their
take
s

on it as well.


Justo: The change is more worth the effort if we can obtain some result. We are
currently very bound to the MMS structure. If changing, we should try to
increase

the level of parallelization, etc. One immediate win would be to avoid
having to bounce back and forth between Python and C++ when doing
asyn
chronous notifications to the
ip
engines.


Idea: the
ip
engines are working as servers for communication that are
still able
to communicate, even while each of them is working on something. Right now,
DBUS provides an asynchronous communication while running parallel clean.
DBUS is a good single line of communication, but communicating that out to
other nodes is probl
ematic right now, and being able to do that at the C++ level
via (say) MPI
/ZeroMQ

would be a way to support that.


Wes does not like the idea of a SAMP
-
DBus translator. He thinks such a translator
is an unnecessary additional layer of complexity. This migh
t
, however,

then
require that things would have to be written so as to be able to communicate
over both SAMP and DBUS.


We will not be implementing this stuff for 4.2, but we should be thinking about
design.


We could change what we have layered on top of
iPython.


Justo: the proposal would require a bunch of stuff to be turned into a server. The
present implementation limi
ts this into only a few places.


Jeff: W
e can leave all the existing

stuff in place, but in addition, just use MPI
as an
extra communica
tion path
to send the message to the nodes that “this is what I
want you to run”.

In other words, t
he
goal is that the
thread that is executing on
the remote

node is not the thread that is

listening on the socket.


Could also be used for logging? Currently
, the
logging for an
ip
engine is basically
done as a redirection of the logging stream. They are all currently writing
independently to individual files. Exceptions are being sent back across the link,
but routine normal logging is not. If they were all to
ld to write to the same file
(which is currently possible), then they would get interleaved and mixed up. A
separate communication channel would perhaps allow a more versatile logging.


If the only user of heavy parallelization is the pipeline, then we pro
bably already
have enough? Or is the customer also going to be general users? For example,

33

what is the expected lifetime of interactive clean? NRAO is planning on having
lots of users log in and use the cluster. Jeff suspects that we are about 10 years
awa
y from having the vast majority of users trusting imaging by the pipeline
(and we
can hence
realistically consider dropping interactive clean).


Another major point: i
f we want to move forward with iPython, then it has to be
reworked anyway.

(
We have alrea
dy used the new transport layer, but there are
other changes we would have to adapt to).


Getting out of using ssh keys is probably
also
necessary if we plan to run on
anyone else’s clusters.


The DBus message would come in from the user. That then has to
propagate up
to Python (or implement that DBus interface in Python). Could instead be
handled by periodically checking state parameters in C++. These parameters are
members that are accessible by the code that is actually doing the job (on the
same process
). The parameters are ag
ents that are leaving the tool


in effect, we
use the Python tool to do it.


We c
ould have a master clean and a bunch of subcleans, all communicating with
the master clean via zeroMQ, but still reusing the Python infrastructure to
start
everything. ZeroMQ can support at the C++ layer. Ultimately, could even get rid
of iPython, if that was desired.


Urvashi: A
quick and dirty
solution might be as follows: A

“stop” would always be
useful, for the major cycle. For the very limited mino
r cycle case, this could
perhaps instead use a single, queued interrupt sent over the network via a signal,
and once the signal has been received
,

the job
gets
stop
ped by its internal
handler

the next time its internal status flag for this is checked

and f
ound to have
been set
.

Note that this approach does not preclude the use of a separate MPI
communication channel as well.


Of course, none of these approaches address the issue of having it run on
supercomputers.


Darrell: We need requirements for
supercomputing. They might be somewhat
different from our usual simpler parallel work.


We m
ight have to prototype up in MPI.


(End of the day)



34

Day 4


Friday


Jeff Kern


Long Term Design for Tools and Tasks


There was s
ome initial discussio
n regarding HPC. HPC is still in the job queuing
model, which is not set up for interactive cleaning.


Users are unlikely to pay us for cluster time (if anything, they’d want grants!).
For users, some might be able to run it on their own institution’s clus
ter. Users at
small colleges without a cluster may be able buy a

chunk of cluster time from
(e.g.
) Amazon.


Reprocessing data: is someone going to reprocess their dataset once? 100 times?
If we are to provide facilities for having people do it, NRAO does n
ot yet have
policies in place to govern this.


The design aim is to make sure that this sort of thing is not designed out.


There was some brief discussion of XCEDE:
http://www.xcede.org/XCEDE.html



The
Environment


First question: Where are a
ll the files needed to start up?

Probably, no
-
one could
accurately casually rattle off the full list, in the right order…


The following list is mo
stly
-
sort
a
-
kinda in order
(
-
ish)
:




Shell context is set up first:

o

In t
he development environment, in a bare shell, source
casa.sh



Casainit.sh



Shell casa, casapy

(Decision proposed: deprecate all but the
first two)



Library search path



OS stuff



CASA
p
a
th => Path Arch



Casa script (depends on current use:



LD Library path



LD
Preload



Debug Environment



MatPlotLib.rc (user’s home directory)



PGPlot dir



.Casainit



.ops



Then we go to Python…



Prelude.py

<
-

loaded from
casapy.py
, which also sets upa few other things
.



Ini
t
.py



Executes
ipython


35



p
ython
.
rc

<
-

Native Python



a
ips
.
rc/casa.rc



.aips
rc

(2 processes)



m
atplotlib.rc

<
-
matplotlib shell environment
.
M
atplotlib

is actually loaded
up by one of the other scripts (
casa.py
, I think)



Viewer

in
casa/viewer.rc




casa_inpy.py

(Same as next?) <
-

Defines a list of tools are instantiated at
preload, Loads tools, and tasks. inp, go, etc. defined.
-
> casa_inpy.py,
casapy.py



casa.py

<
-

The Python include, so that someone case type “python import
casa”
. Also has to be updated with new tools. Uses the
lowest level of the
generated scripts.



casapy.py

<
-

Sets up ipython
-
based client
. Checks for command line. Uses
the
_cli

scripts.



casa_in.py

(some confusion with the above during discussion?) <
-

ipengine support; Is a subset of
casa.py
, with different stuf
f for the
ipengine setup.
Task_pq

(check parameters).
Task.py



casac.py



MatPlotLib



ASAP



CASA Definition version info




Sourced by casapy script:

o

casainit.sh

o

aips++
local.sh

(3 different locations…)

o

casainit


“Truly, you have a dizzying intellect.”
--

Westley, “The Princess Bride”


There was general agreement that the above should be examined, requirements
extracted, and a simpler structure created.


Have one framework that delegates stuff off.


For example: Casa.py should be only the interactive
client. Loa
ding all the
mod
ul
e
s should be done in one place, not three.

We should get the
Prelude.py

as
early on on the process as possible.


We should have a .casa directory in each user’s directory, and include a
matplotlib directory which specifically s
ets it up for casa use, without interfering
with other uses of matplotlib.


Kana: Can possibly exploit various matplotlib.rc with different back
-
ends
specified for users, pipeline use, etc.


Jeff:
We could say: “If the prelude exists, use it. Otherwise, we

set it.”

Darrell
thought that this might enable users to get themselves into trouble: we should

36

not allow users to redefine where casa looks to pick up matplotlib.

Lindsey
makes use of aliases for the start
-
up options.


Casa.py does not import matplotlib,

which would mean that Pipeline is free to
decide how it should be imported in the Pipeline environment.


Simplifying, we could have

something (probably
with
renamed

files
) like
:




Casa.py that gets us tasks



Casapy.py is the interactivity



Casa_in.py


Parallel;
pg wrappers, etc.




Prelude.py



init.py


for initializing user and pipeline tasks






Multiple environments: developer, end user

production
, pipeline, etc.
Darrell:
Should have different environment variables.

Scott: OK, but would like to have
the
developer and user environments have as similar a start
-
up as possible.

Darrell would find it a real pain to have to go through most of the LD preload
stuff, rather than his minimal
environment set
-
up for viewer work, which uses
bits of CASA outside the fu
ll CASA environment.

The proposed scheme laid out
by Scott would run the extra user stuff (extra environment variables, etc.) late on
in the process.


Two stages are needed: Python cleanup and the shell cleanup.


Action Item on Jeff: Define the order in wh
ich these should be tackled.


Have two customizable init files: one at the start, and one at the end. If we do not
already have one, we should have a customization section in the CASA Cookbook.


We should not duplicate code if we do not need to, but bits m
ight crop up a bit in
generated scripts.


(Coffee)



Jim
Jacobs


CASA and the Environment


(See slides)


Preferences: Things that both developers and end
-
users may be able to toggle on
and off.

The difference between these two would be published vs. unpublished…


Although minimizing the use of shell variables, we probably cannot entirely
avoid path environment variables.

OpenMP uses environment variables.

Environment variables are
intrinsically
harder to keep track of

(or remember
what they have been set to)
.

Some discussion ensued about how best to set a

37

one
-
off, non
-
persistent setting for a single session.

Some concern about users
forgetting about .rc files with (e.g.) old matplotlib settings o
r something.


Decision: We should not create any new environment variables. Also, if we have
the choice between using an environment variable or an API, we should use the
latter.


Something to consider: some way to quickly get a handle on what all the
pref
erences are, library files are, etc. It would help with (e.g.) user diagnostics.

Maybe have a list of default values defined as well.

We want to avoid having
things silently being on (or not on)


users should be notified when things
change.


Asynchronous
I/O is a system
-
wide thing that does belong in the preferences file.


We perhaps do not want to mix system wide preferences with things like GUI
preferences.

In that case, Susan would like supporting structure in place so that
writing the preferences out d
oes not require her to write a lot of additional code.


Non
-
GUI case: class living in the C++ process case. Should also be able to do a “set”
to update the value just for the current session, and a “write” which will export
the current values.


Darrell’s u
se case: run a viewer with particular settings, then run another one
and expect it to pick up the settings of the first.


Action Item on all
: Provide f
eedback on
all of
these items on the CASA SE wiki by
August 1
st
.


Takeshi: Python path environment
variable. It defines the module load paths. But
in Mac OS X, if he (e.g.) added his Python directory, it should be in SYS.PATH after
startup. Takeshi will create a Jira ticket for this, and assign it to Scott.


task_name.py <
-

method with tool calls. Jeff
feels that all these tool calls should
be private, not universal.


Jeff: If you know your tasks are using the globally instantiated tools, or are not
closing their tools when finishing, please fix it!


taskinit

<
-

Actually
an init for global tools.
builds
up the list of casa tools that sit
in the namespace.


Casa.py imports all of the tools into its namespace. Taskinit then instantiates
stuff.


Decision: taskinit to reduced/renamed as globaltoolinit and be deprecated
everywhere expect the casapy.py section.



38

Casac is where all the tools live, and where they are imported from (anything
bound to Python).


The current correct practice is “from casac import imager, and then myimager =
imagerc”.


Any rules as to how many functions a tool should have? No: do what
makes sense
from a software engineering point of view. If need be: make a tool and document
it.


As Mark works
his way
through the tools

documentation
, if something does not
make sense (“Why is that there?”), then flag it.


Currently most external folk who

use the tools directly are the more
sophisticated users.


Wes recently put in some code to enable the display of included examples, etc. in
the tools documentation.


“Rules about state setters”


no
-
one knew what this agenda item meant…


Return values: de
cision was taken that at the tools level, we return tools from

tools. Kumar advocated also be
ing able to send tools

(for some cases).


Tool exceptions should always propagate through Python (unless already
handled at the C++ layer).

Jim would still like to

see the original exception, to
gain insight into the original context.

He had put code in to effectively do this.


Action Item: It would be nice to implement it properly, so that the Python error
that gets generated has a socket so it can include the orig
inal exception

(if so
desired)
.


Jeff: For clean, we fundamentally write a function, which is where we put our
code. Why is it a function, rather than a class stack? Would like to see a common
class stack. Wes was not sure that this buys us anything. Darre
ll: if you look at
the viewer, it creates a viewer_class, implements all that stuff …


There are a number of things it would be nice to take out of the CLI level, and put
in instead at the casapy level. Wes: clean.py effectively already does this. History
logging for the MS? Can be opened as writable or non
-
writable.

David thinks that
there are no established standards…


Minimum first standard: any function call that modifies the MS should report
what it has done.


George advocated keeping history files
clean, as their usefulness drops with time.


Tool methods are the things that actually do the modification. The task level is
what the user sees. Users call the tasks, tools invoked should record what they

39

are doing. What about helper functions? If you hav
e the begin task line, then you
have the log entries of the tools changing things underneath that…


Plotms and the viewer open files initially as read
-
only, but when (e.g.) flagging
starts, then they reopen the file as read/write. Kumar advocated generally

prioritizing opening files as read
-
only unless a file needs to be written to.


Jeff felt that this sort of logging should not go into the code
-
generated level, as
there are then only a few people who can get at it. Would like it so that we
automatically i
nsert useful things into the history, so that
devs could then modify
it to make it more useful.


George, Sanjay argued for the possibility of logging at the tool level as well as the
tasks, to aid expert users when writing things at the tools level. We cou
ld have a
user preferenes flag to indicate whether or not to log it.


We could just try to write to the history file from whenever an MS is opened. If it
is opened as “read only”, then that write attempt would just fail (which would be
OK).


Differentiatio
n:




History: How the file(s) have been changed.




Logging: A record of everything done/attempted during a session.


Should we propagate histories from calibrators over?

Could get really messy and
confusing…


Sanjay likes the .last file.
George prefers to ju
st look at the script that was run.


Lindsey: complicated histories in searchable archives are potentially a pain.


There is a requirement for histories. There is also a requirement that the logger
be scriptable.


It would be useful to be able to turn
history writing on and off. We can probably
do something fairly simple for logging, and if users are not satisfied, then we can
do a requirements capture.


Backwards compatibility


Can we reuse XML? Could we, and does it makes sense to have a dictionary of

parameters?

(“From parameters import vis”, etc.).


Enforce that the toggles all work the same way, and ultimately that they all be
handled in the code the same way. General support was expressed for this.

The
right thing would be to find the dictionary.



40

Question about helper functions: Sometimes we want to hide them, as we don’t
expect the users to invoke them. (This may be an exclusively SD issue, as the
Python layers are thicker).
Users can give a string, or an integer, or … The Python
implementation is

to convert the value from one type into another, and then pass
it to a helper function. Would like to just show users the original function
, but
hide the helper function. Is there a mechanism available for this?


If users insist on calling the “__” method
s, then they deserve all they get…


Some developers are using classes underneath the tasks, and some are not
(which is OK). However, the actual function that gets called should generally
make use of a class, rather than using long functions.

(Or split it o
ut as a method).


Generally, the trend should be towards smaller, testable chunks, rather than
having huge, monolithic methods.


Guideline: No method should be longer than a screen (no smaller fonts, rotating
screens…). OK, OK: none longer than 80 lines!

R
ealistically, Classes should top
out at ~50


70 methods.



(Lunch)


Wes raised the topic of the long start
-
up time of CASA. Some of this is
attributable to the loading of many things at startup. There are 672 Python files
loaded into memory
(which does
not include any of the tests, regressions, etc.).


If the cache is in place, Kumar gets around 30s, without the cache, 2
-
3 minutes.
Wes loads it in 24s from an SSD.


We could load on an as
-
needed basis. Wes suggested loading based on tasking,
read the XML
files that describe the parameters. We instead have a factory
method instead.

We would still load all the tools (all the C++ libraries, etc.), but
the tasks would be loaded on demand.


The bare minimum casapy opens 256 files.


A note on the design pattern:

in the longer term, look into shortening startup
times and reducing the number of files loaded at startup.



Jeff Kern


Documentation


(See slides)


Task XML:

Wes had produced a schema a couple of years ago.


Action on Wes: D
ocument
(probably via a wiki
page) the description of the task
XML.


41

We now get the description and then the example (this means that there is
currently some duplication of text).


Jeff claims that what we effectively have right now:


Example


Description


<Long parameter descriptions
>



|


|





Example



Propose instead: Descriptions and long descriptions both live up near the top.


Input


Param


Description


Long Description


There is apparently currently a separate help par(nx) (for example). These
should be autogenerated in the
future as well.


Maybe have expert parameters, which are hidden, unless the mode has been set?
General consensus: probably not worth
implementing hidden expert parameter
options
.


CASA Memo Series?
Primarily for internal use.
If we do it, will we actually write
the memos? Would contain collated design principles documentation, maybe
these
minutes, etc.


Proposed way forward: have

a page off the main CASA page?
Jeff will provide a
place for
the memos

to go. Build it once, take t
he static codes and link them.


Could also link to the telescope
-
specific technical documentation.


Sanjay would like to see doxygen linked in to the nightly
build as a high
-
priority
item
.



Time for a new group paper!


Big picture stuff, and then drill do
wn into details in a second document? At least
the big
-
picture article cannot be allowed to be stale: a living document?


Should live in SVN too.


Maybe also separate from a more “political statement” publication: an ADASS
paper or something, with a link t
o the living document?


42


Perhaps the way to start on this is to begin working on the living document, and
Jeff can work on a ~10 page SPIE paper or something

that would contain a link to
the living document, and then the SPIE paper

would get updated/replaced
~every three years?


Action on Jim: Look at the project book and figure out how it and the architecture
document fit together. Then, come up with an infrastrucuture, and tickets will go
out to everyone asking people to update th
e germane sections.



(Coffee)


Final review of the meeting!