PURPOSE: It is not thought that you will ever need to do any of what ...

seasoningalluringData Management

Nov 29, 2012 (5 years and 1 month ago)

154 views

PURPOSE:


It is not thought that you will ever need to do any of what follows,
but it is given for the record. It is assumed that you have looked
through 01INTRODUCTION and 02GLOSSARY, and have examined a single
generation SOURCE file, e.g., TWO07SOUR
CE.ods.


Field recording:


Field spreadsheets contained records for 2 generations: the emerging
nests and the new nests.


Special notations were developed to allow coding the contents of a
nest cavity in a single cell of a spreadsheet on a small
screen personal
data assistant. The code and its usage were elaborated over a few
years. To learn to read the records at a glance see the 04DATA_CODING
directories, e.g., 04b_WORKED_EXAMPLES.


The following Python software expanded the records to si
mple almost
full pagewidth spreadsheets for direct analyses or incorporation into
data bases.


Back up:


Make copies of all the original files and store them safely.



Initial preparation:


The Excel or lotus123 field data spreadsheet was

converted to an
OpenOffice ods file e.g., TWO07field_1c.ods and then to a 1 page csv
spreadsheet, e.g., TWO07field_1c.csv.



Installation of Python:


from www.python.com


Terminal window:


Open a terminal win
dow and go to a work directory preloaded with the
modules, LABELS and COLOURS files.

Command line entries are given in bold


python


#import {{appropriate}} module.



MODULE tidy2genfield_latest.py:



Always open any module in a ge
dit text window in case one needs to
read the """documentation""" at the beginning of the module or at the
beginning of each major function or look through the #comments for the
programming that leads to error messages. Always scroll through output
files
looking for unexpected or missing column contents or for ERR error
messages.


NB. Modules always take csv files for input and produce them for
output. These csv files (must) have of a constant number of columns
across all rows. Such csv files can be

obtained directly from or
converted to ods spreadsheets. An edited csv file may not be accepted by
the Python module (mysterious fail of a function).



import tidy2genfield_latest.py as ti


ti.split() #respond to prompts.



edit the output file, checking for ERR messages.


Save this file as e.g., TWO07sheet_1c.csv. Convert it to
e.g., TWO07sheet_1c.ods and painstakingly colour it for improved
readability.


NB. The split() function no longer needed as

all field data
have been split into single generation spreadsheets with the records in
the last column W=22 (with A=0).


If you plan to _completely_ revise the expansion of records, e.g., for
some other method of handling missing values, then you will

be beginning
with a fully coloured and documented SOURCE.ods file. To redo this file
completely cut out all columns beyond W, and all rows above the start of
the records (usually row 20, sometimes 24), and type into the terminal
window as follows.




ti.undodict(22) #and respond to prompts.


In the early record expansions species names (actually 3
-
letter abbreviations) were assigned to nest names by hand using the
LABELS workbooks. Species names to nest names are 1:1 or 1:several,
dep
ending on the species and variations in the nest. This function
removes the species names, placing the denuded records in column X=23




ti.dodict(23) #respond.


The page 'dict' in the work book LABELS_latest.ods provides
new names.

Convert the page to LABELS_latest.csv. A bug has crept in (to
the dict page?) so that it is often necessary to replace error messages
of the form, e.g., ti.diagnoseERR1_db by hand with olA_db (olA is Osmia
lignaria, db is the nest name (2 letters from (
'mudusing bee'). See the
LABELS file for species names and nest names.



Delete old columns (e.g. W and X) leaving newly named species
column in column W=22. Call this file ,e.g., TWO07dict_1c.csv.


NB. Note that this last column, W, is t
he fair copy of the
original record and contains the suffix q for, e.g., uncertain causes of
loss or slightly doubtful values, and Q for missing counts.




NB. It is unlikely that the above dict work need to be redone.
Difficult species gr
oups (leaf cutters, potters) are already split into
single species or other groups, allowing, if necessary, selection and
_recombination_ to suit the analyst's special bent.





MODULE sheetexpand1gen_latest.py:



imp
ort sheetexpand1gen_latest as sh


Search for comments on the global _Qswitch which decides how one is
to handle missing data. I have used value 1 for generations 2001
-
2003 to
block extrapolation of the missing early nest stages (missing counts are
cod
ed as Q). For later years I have used _Qswitch=1 because
interpolation or extrapolation is a minor problem. Modified records are
marked with a q or Q just before the &.



sh.tidyup() #respond.


This function adds a column X to the inp
ut spreadsheet, which is
a recoding of column W with interpolations for missing data
(substitutions for Q), removals of q, and additional codes Q and q added
to the 'dot' string before the '&'. In addition the terminal gives
summary statistics of modest u
tility (beginning 'Total data segments:')
for the number of changes made. If you intend to attend to this output
you really should first read the comments to the function tidy().


Search for ERR messages in output. See tidyup documentation. Name
t
he output sheet e.g., TWO07tidyup_1c.csv.




sh.advanced() #respond.


This expands column X (in the case of fortunately rare, heavily
nested, cavities to almost or even slightly more than the full width of a
spreadsheet!). Readabi
lity and continuity for parasitized or multiply
nested cavities is increased by padding letters that allow for less than
3 parasites per nest (mpl), or for nonnests or special records (ss).



Save the output file as e.g. TWO07source.csv. Search i
t for ERR
messages and funny output.


Convert to TWO07SOURCE_1c.ods and insert 19 or more rows to
restored the deleted header information.




Colour the expansion (columns Y upwards) by 'past special
formats' from the file COLOURS49re
peatY.ods. Check down all the green
columns for normal contents.


Colour columns A to W with the colours of the original, e.g.,
TWO07SOURCE_1c.ods file.


Relentlessly search for funny output. If found correct the
column W entry and reexpa
nd using the



Nearly all the columns of the expansion contain single text or
number items suitable for data basing. Exceptions are (A) the causes of
loss columns which contain text and numbers and may record multiple
causes of loss, and (B) the

'dot' strings which represent dated nest
events. (A) and (B) can simplified by find and replace before databasing
or perhaps handled by data base coding or a special Python reformat. As
to (B) the only event dates of analytical interest are f for
female
(mother) and e for egg, because these are short lived and thus
meaningfully timed events; the presence or absence of structural codes
(w,p,s,u,x) is also of interest but their dates are often very belated.




As to data basing PostgreSQL under Py
thon control seems the optimal
modern approach.



MODULE anasheet1gen_latest.py:


This module was written as data basing is unnecssary for many questions,
though of great utility if handling all years and localities at once.



import anasheet1gen
_latest as an


Cut of the descriptive header of usually 19(sometimes 23) rows.



an.do() #respond. Name the output csv file and convert it to
e.g. TWONESTCOUNTS.ods


Colour it. Add titles. Add bore width data from the terminal.