06-0149 - COBOLStandard.info

goldbashedΤεχνίτη Νοημοσύνη και Ρομποτική

15 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

74 εμφανίσεις


J4/06
-
0149

(WG4N0262)

Page
1

of
9

September 27, 2006


Page
1

TO:



J4

FROM:


Ann Bennett, WG4 acting convener


Following is the response from the UK Head of Delegation to the WG4 informal HOD straw
poll regarding the XML TR. Please evaluate the comments and respond to the WG4 acting
convener.



UK Comments on the X
ML Proposal

The substantive part of the UK comments are in sections B) and C).

A)

Introduction

There are substantial worries about this proposal that go beyond corrections to the
document. Since XML is of the utmost importance strategically, it is imperative

to
bring it whole
heartedly into COBOL with some fundamental new concepts. The
proposal requires chopping an XML document into manageable pieces, using artificial
01 levels, that can be handled by old
-
fashioned COBOL, instead of elev
ating COBOL
to enco
mpass the new Digital Revolution. The proposal is much too complex and,
because the low
-
level procedural operations interact in complex ways, there must be
a host of pitfalls and misinterpretations waiting to be discovered which will occupy the
committee
for a long time hence and may never be adequately resolved. The com
-
plexity will make the feature difficult to understand by present
-
day programmers and
will put its usability into very serious doubt.

This paper expresses the strong opinion that the use o
f file definitions for XML and the
hijacking of COBOL’s i/o verbs to parse a record which is already in memory are very
ill
-
advised and should be changed to a simple in
-
memory verb such as a
move
.

It is recognised that the proposal would benefit from the i
nclusion of
dynamic
OCCURS and ANY LENGTH

items but that the TR can only be applied against the
current standard.

As a result, some of the methods, which are only sketched in the examples, are
finicky in the extreme and will tie programmers in knots. It o
nly needs one half
sentence (“except in files of organization XML”) to allow this flexible OCCURS, since
it’s already in the CD, and greatly simplify most of the procedures. The same applies
to
ANY LENGTH

items, which
are

mentioned in the proposal.

The XM
L feature went to a formal proposal very early and this makes it hard to see
how the features work amidst all the insertions into the Standard. There have been
few detailed working papers that pursue all the implications and illustrate how every
practical

case is resolved. The examples in the
Concepts

part of the proposal are
much too brief and are flawed (see later section), giving the impression that the
proposed features have not been worked through. Without this kind of detailed
research, it is unkno
wn whether the feature will ever work.


J4/06
-
0149

(WG4N0262)

Page
2

of
9

September 27, 2006


Page
2

Despite these fundamental problems, there now follows a list of comments, as
requested by the recent response poll. Answers to the more minor concerns may in
fact be hidden in the proposal, which is now so complex th
at it is difficult to find one’s
way around it. Several of the major comments were made several years ago when
the proposal was fairly new, and must now be reconsidered in the light of recent
developments.

B)

Major concerns.

1)

A fateful decision of the proposa
l was to view XML as a type of
file

organ
isation.
Instead, it could have been viewed as just another way of representing
information, similar in effect to COBOL’s traditional way but using tags. XML data
is often passed via linkage, such as by an object

method, or as a pointer, or
placed in working storage by some means, such as a CALL to a server routine or
a database. The proposal calls this an “in
-
memory” document. The dilemma for
the programmer is that he/she is forced to turn the 01
-
level record,
conceptually,
into a “file” and navigate the complexity of all the file operations such as
read,
write, rewrite, start

with a bewildering range of optional phrases like
invalid key
.
Instead of defining the XML using data clauses, the pro
gram
mer is in ef
fect
forced to write a kind of “file handler” where the document structure is concealed
and, except in simple cases, is spewed out in fragments.

No doubt the reason behind the
read
-
write

approach was that it
seems

easier for
the implementor. This is a fal
se economy because it means the difficulties are
shifted to the programmer. Ten implementors do a better job than ten thousand
programmers (especially the current breed), and the bugs only have to be
cleared once.

The final example in C.2 shows the proble
ms. We have an 01
-
level containing
the document, assumed to be in memory. The proposal
should

have provided
something like this, using a
complete

view of the record and a MOVE to/from any
of its lower
-
level entries:

01 book
-
document identified by "book
-
document"


(some clause to say that this is an XML record).


... (some data)


05 root
-
tag identified by "library".


10 root
-
data pic x(80).


10 books.


15 book identified by "book" occurs dynamic


count in kount.


20
book
-
data pic x(80).


... (some more data)



MOVE books to conventional
-
COBOL
-
data

The record layout above is “raw” XML, not traditional COBOL contiguous data. It
can have many levels of hierarchy and it may be in any COBOL section. If the
record is
in a file, a conventional
read

is done first. The
move

state
ment “knows”
that the sending item is XML and performs all the necessary parsing to find the

J4/06
-
0149

(WG4N0262)

Page
3

of
9

September 27, 2006


Page
3

“book” data. (The implementor will of course optimise this process when more
than one
move

is used.)

The
move

is doing what the
read

and
write

do in the
current proposal, except that the implied navigation is done by the logic behind
the
move

statement, rather than by the programmer.

As the current proposal stands, instead of that simple
move
, the progr
ammer has
to provide the following:

a
SELECT ... ASSIGN

for an in
-
memory file

an
FD

statement in the File Section

a record describing the
root
-
tag

data in a separate 01 record

a file
-
level OPEN

a document
-
level OPEN

an element
-
level READ for
root
-
tag

a ser
ies of READs for
book

a series of MOVEs

a document
-
level CLOSE

a file
-
level CLOSE

The description of the document is lost because the data is viewed broken up
into many 01
-
levels and the logical organisation of the document vanishes in
complex procedure.

T
o pick a particular occurrence of the
book
, the proposal provides a START verb,
instead of simply moving a value to a subscript.

The proposal should therefore be changed to separate the
physical

file
operations and the
logical

parsing and generating of XML

data. In fact very little
needs to be added at the physical level. XML data is usually held in what
COBOL would call a variable
-
length serial file, but it
could

be held within any kind
of existing file structure.

2)

Items of indeterminate length, such as r
emarks, descriptions and comments are
essential

to XML. It is mentioned in B.1 under
Unresolved technical issues

that
the
ANY LENGTH

clause will be available. However, ANY LENGTH is not
allowed in the File Section at the moment, and this restriction need
s to be
removed
now
.

3)

Items that
occur

(repeat) an indeterminate number of times are also
essential

to
XML. So, since any
-
length items are to be allowed in XML
-
type records, it’s even
more important to allow the OCCURS DYNAMIC clause in order to handle ite
ms
that repeat any number of times. OCCURS DYNAMIC is
made

for XML and
exactly corresponds with the DTD
* (asterisk)

symbol which says “this item is
repeated any number of times”. (And the
minoccurs

and
maxoccurs

of the
DTD correspond with the FROM and T
O of the OCCURS clause.) The COUNT
(or CAPACITY) clause is already there to hold the number of occurrences
actually found. This is
not

a future enhancement: it must go in
now

with ANY
LENGTH. We don’t want programmers to risk hitting the limit and then
having to
read tag after tag to find the remainder (see example below). The change can be
done in a half sentence!


J4/06
-
0149

(WG4N0262)

Page
4

of
9

September 27, 2006


Page
4

4)

COUNT clause.

The COUNT clause is essentially the same as the CAPACITY phrase of dynamic
-
capacity tables, and it seems a pity that one of th
em is not changed to make them
the same keyword. When these big new features appear together in the next
Standard, it will look as though different parts of the committee disagreed.

For a non
-
occurring item it seems wrong to use the same COUNT keyword and

a
“count” that can only be 0 or 1. Better would be to use a boolean item or add a
new COBOL condition such as
identifier

IS [NOT] PRESENT … or simply to state
that a blank (i.e. space
-
filled) item that is defined as optional will correspond to an
item om
itted in the XML.

5)

OCCURS DEPENDING. This should be prohibited in an XML record. XML
documents do not usually store counts as data items and we have the COUNT
(or CAPACITY) clause anyway. An ODO would conflict with it and there are too
many questions ari
sing.

6)

Absent items. It’s not clear what happens to an item that is
absent

in the XML
document (i.e. optional and omitted from the document or an excess occurrence
in the case of an OCCURS clause).

What goes into a COBOL data item on READ if the XML item i
s
absent
? Is the
data item initialized or space
-
filled? This should be made clear. (All the items in
a File Section record are changed after a READ so
something

must be placed in
all of them.)

There does not seem to be a
requirement

for a COUNT clause f
or items that
might be absent in the XML. On REWRITE, if a data item was
absent

but has no
COUNT and the program has not moved a value into it, is the item still absent in
the XML? This should be specified.

If a WRITE takes place (to insert new data into

a document) and an alphanumeric
item defined as optional in the DTD contains spaces in the COBOL program but
there is no COUNT, is the item omitted from the XML document? (Hopefully yes.
It should not appear as a blank tag when it is not required.) Thi
s should be
explained.

7)

Ordering of items. Since XML depends on tags rather than contiguous ordering
(see the IBM document
Principles of XML design:


when the order of XML elemen
ts
matters

) it should be stated explicitly that the “XML order” need not be the same
as the “COBOL order”. For example, “boy”, “boy”, “girl”, “girl” could appear in the
document as “boy”, “girl”, “boy”, “girl”, etc. etc. unless intermediate XML elements

(“boys” and “girls”) are defined, or the DTD has a
sequence

keyword to enforce
the order.

As it stands, it’s not clear that the proposal can handle these different orderings.
For example, would a (group level) READ of the following data description work
in all possible orderings?


J4/06
-
0149

(WG4N0262)

Page
5

of
9

September 27, 2006


Page
5

01 children IDENTIFIED BY “children”.


03 boy occurs 2 IDENTIFIED BY “boy”.


03 girl occurs 2 IDENTIFIED BY “girl”.

The proposal does not seem to rule this out but it should explicitly say that the
XML elements, in whateve
r order they arrive, are rearranged if necessary to the
order given in the COBOL record.

Again we have the question as to what happens to the order if the record is now
rewritten. (Is it retained? or re
-
ordered after the COBOL sequence? is this
undefined
? does it matter?) May be it should be explicitly stated that COBOL
writes data in the order specified in its own data description.

8)

It’s not clear what the status of the
element position vector

is after an
unsuccessful READ. It should be stated that the

element position vector is
“established for data
-
name
-
2” even though no data
-
name
-
2 was found. This is to
prevent the situation:

(1) Read myxml element only root

(2) Read myxml element rare
-
element

(3) Read myxml element regular
-
element

For this

code to be correct, another Read (1) should be inserted before Read (3).
But if “rare
-
element” is
absent
, the element position vector might be left
unchanged by some implementations so that Read (3) works as expected.
Then, on the rare occasion that “ra
re
-
element” is
present
, Read (3) will give an
“at end” condition, according to GR31, as though there were no occurrences of
regular
-
element present. In other words, this is a serious pitfall that
programmers may fall into, yielding incorrect results under

certain true
-
life
conditions.

9)

Characters > (greater than), < (less than) and “ (quote). These are presumably
permitted within the contents and get converted automatically to &lt;, &gt;, and
&quot; in the XML document. This should be stated. However, if

RAW

is
specified, for the purpose of building up from scratch or repairing an XML format,
a GT, LT or QUOTE might be a “semantic” one and need to stay as it is. (The
introduction to 13.16.28a
IDENTIFIED clause

may in fact say this.) But the
programmer u
sing RAW may need to store a “non
-
semantic” LT etc. as
&lt;

etc.
This needs a general rule.

10)

C.2, “slightly more complicated example”. This example will be confusing for
readers. The namespace
yourname

only occurs once and is rightly defined on
its own,
but the attribute name and value would surely occur several times. So
why is there no OCCURS clause? Programmers will need to be told that they
have to obtain each one by successive READs. But this example should use
OCCURS … DYNAMIC to define the attri
butes:


J4/06
-
0149

(WG4N0262)

Page
6

of
9

September 27, 2006


Page
6


15 yourname PIC X ANY LENGTH.


15 root
-
tag
-
attr
-
name OCCURS DYNAMIC.


20 root
-
tag
-
attr
-
name PIC X ANY LENGTH.


20 root
-
tag
-
attr
-
val PIC X ANY LENGTH.

11)

C.2, final example (
Consider an example to illustrate the COUNT c
lause
). This
seems a very bad example and illustrates a problem with the syntax as it stands.
How does the program know that there are more than 5 occurrences of the tag
“book”? How does it know that it has to do that “subsequent read”? Of course, it
c
arries on reading if it gets a maximum count “just in case” and the next count will
be zero if there were exactly 5, but the example must explain this point and must
be generalised for any number. Note that in the following we cannot insert a
group item a
bove “book”, because the item would not have an equivalent in the
XML document (and must have an IDENTIFIED clause by 13.16.28a.2 SR(2)),
and so we have to do single MOVEs which is ugly. Nevertheless, the example as
it stands must be changed to this (if i
t is correct):

01 root
-
tag identified by "library".


10 root
-
data pic x(80).


10 book identified by "book" occurs 5 count in kount.

...

01 ws
-
kount pic s9(4) comp.

01 book
-
sub pic s9(4) comp.

01 ws
-
book
-
table.


10 ws
-
book

occurs dynamic.

...

Read myxml element root
-
tag *> initial read

Move 0 to ws
-
kount

Perform until kount = 0


Perform varying book
-
sub from 1 by 1


until book
-
sub > kount


add 1 to ws
-
kount


move book (book
-
sub) to ws
-
book (ws
-
koun
t)


end
-
Perform


Read myxml element book *> subsequent read

end
-
Perform

The best solution is
now

to allow the
dynamic

syntax that is already in the Draft:

01 root
-
tag identified by "library".


10 root
-
data pic x(80).


10 book identified by "book
" occurs
dynamic


count in kount.


15 book
-
data pic x(80).

12)

13.16.28a.2 SR2): this says that COBOL cannot introduce its own group levels
that are not reflected in the XML. There is no reason for this and it is important to
remove this restriction.

Consider the layout:


J4/06
-
0149

(WG4N0262)

Page
7

of
9

September 27, 2006


Page
7

01 account IDENTIFIED BY “account”.


05 receipt occurs 10 IDENTIFIED BY “receipt”.


05 payment occurs 10 IDENTIFIED BY “payment”.

Because of this unnecessary restriction, the programmer who wants to move
all

the receipts
or
all

the payments has to move them one
-
by
-
one. There is
absolutely
no

reason why he should not write:

01 account IDENTIFIED BY “account”.


03 receipts.


05 receipt occurs 10 IDENTIFIED BY “receipt”.


03 payments.


05 payment occurs 10 ID
ENTIFIED BY “payment”.

Here,
receipts

and
payments

are purely COBOL items which do not exist in the
XML (indicated by the lack of an IDENTIFIED clause). The programmer can now
do a
MOVE receipts

and
MOVE payments

etc.

13)

6.4 item [i] (REDEFINES clause). The
re is absolutely no reason why
REDEFINES should not be used, provided that a redefinition does not have an
IDENTIFIED clause at the same level or below. For example, if we want to break
up a 10
-
character code, the following should definitely be possible:

01 cust
-
rec IDENTIFIED BY “cust
-
rec”.


03 cust
-
code IDENTIFIED BY “cust
-
code” PIC X(10).


03 cust
-
code
-
2 REDEFINES cust
-
code.


05 cust
-
code
-
head PIC 99.


05 cust
-
code
-
middle PIC X(7).


05 cust
-
code
-
tail PIC 9.

Only items with
an IDENTIFIED clause correspond to items in the XML
document, so the REDEFINES has a completely neutral effect.

14)

Item [h3]. The OPEN and CLOSE DOCUMENT verb form is simply horrible. It
turns the OPEN (this form, at least) into a
conditional

statement and
requires the
programmer to do two OPENs for the
same

name! The example in C4.4 brings
this home:

open i
-
o

quote
-
info

open document

quote
-
info

...

close document

quote
-
info

close

quote
-
info

A double OPEN goes right against the principles. Either the docum
ent should
have a different name from the file or the OPEN and CLOSE DOCUMENT
should be some other statement. Also, OPEN … AT END is out of the question.
OPEN does not produce an AT END!

The question should be resolved in the
traditional way by means of

a file status code. (This OPEN is really a READ


see the introduction. But using files is the wrong approach anyway.)

15)

12.3.4.16 The CHECK VALIDITY phrase should be on the OPEN or CLOSE verb
respectively (WITH VALIDITY
CHECK
). It is a procedural conce
pt, it may not

J4/06
-
0149

(WG4N0262)

Page
8

of
9

September 27, 2006


Page
8

need to be done on every open or close, and it may not be appropriate, such as
when OUTPUT is specified but the file is only opened for INPUT.

16)

Invalid Key phrase. This goes against the principle that INVALID KEY refers to a
file that has KE
Y (as a keyword and as a concept). XML elements are not keys.
The phrase should be changed when we understand its purpose. The proposal
itself seems to neglect this phrase and rules need to be added (assuming we
have the latest version):

Item [p]
REWRIT
E statement
: There is no mention of imperative
-
statement
-
1 or
imperative
-
statement
-
2 in the GRs.

Item [w]
WRITE statement
: There is nothing in the GRs to describe the purpose
of the Invalid Key phrases and no mention of imperative
-
statement
-
1 or
imperati
ve
-
statement
-
2.

C)

Minor concerns.

1)

Item 4: add definitions for these terms which are referred to continually:
namespace
,
attribute

in an XML context,
subdocument

(mentioned only once
under CLOSE statement) and possibly
CDATA

(or add a passing reference or
exp
lanation “Character Data (CDATA)”).

2)

Item 6.1: Shouldn’t DISCARD and VERSION
-
XML be context
-
sensitive? (After
all, DOCUMENT is context
-
sensitive in the CLOSE statement.)

3)

Item 6.1: END
-
OPEN is also a new context
-
sensitive word.

4)

Item 6.5: EC
-
DATA
-
NOT
-
A
-
NUMBE
R is misspelt.

5)

EC
-
XML
-
CODESET
-
CONVERSION is misspelt in two places as EX
-
XML
-
CODESET
-
CONVERSION.

6)

Item 6.5: under EC
-
XML
-
COUNT, “COUNT phrase” should be “COUNT clause”.

7)

12.3.4.17: 1.1 or 1.0 will not be written on CLOSE if the document was OPEN for
INPUT.

8)

1
3.16.15a COUNT clause,
Note

after GR4: this is a major processing rule and
should not be a Note.

9)

Item 6.5 [c]:
infinity

is not available as a value in COBOL, so it surely cannot be
assigned to an item!

10)

C.4 in the example:
address

and
name

are reserved word
s!

11)

C.4 same example: it’s a good idea to explain why you would do this piecemeal
rather than simply using an OCCURS 20 and an OCCURS 50 with COUNT
clauses. Also explain why the element “root” has to be written separately.


J4/06
-
0149

(WG4N0262)

Page
9

of
9

September 27, 2006


Page
9

D)

Other Issues (for consideration)

1)

Consider using DTD to specifty the XML structure. Is new syntax needed for
something that already has established syntax
-

ie the DTD? If new syntax is
specified then:

a) it has to be able to specify the same things, so will, presumably, be isomorphic

b) a
ny vendor is going to create a conversion program to map DTD onto the new
syntax.

2)

Consider using Xpath to navigate through XML.

3)

Having rules to map an XML element (with sub
-
elements and attributes) to some
COBOL data structure would be very convenient (and

save a lot of MOVE
statements), but it looks like you're jumping through a host of hoops you don't
really need to.

E)

Conclusion

Markup Language processing is too important to accept this proposal as it stands. It
must be changed urgently to analyse the XML

in

memory

only and in a schematic
(whole record view) way, with drastic simplification of the syntax, a few additions to
the data division to model XML structures, and a resultant vastly greater appeal and
acceptability to the public.

End of document.