Guide to the ADL Gazetteer Content Standard - Alexandria Digital ...

assistantashamedData Management

Nov 29, 2012 (4 years and 11 months ago)

227 views

Guide to the
ADL Gazetteer Content Standard

version 3
.
2

February 26
, 200
4


Responsible party:

Alexandria Digital Library Project

University of California, Santa Barbara

Santa Barbara, CA 93106

http://www.alex
andria.ucsb.edu


Contents

1.

Purpose

2.

Overview

3.

History of development

4.

Core elements

5.

Treatment of geospatial description

6.

Treatment of temporal description

7.

Links to external sources of inform
ation

8.

Treatment of attribution to source of data

9.

Relational database model

10.

ADL implementation


11.

Availability and contact point

12.

Acknowledgements

13.


References and links


1. Purpose

The ADL Gazetteer Content Sta
ndard (GCS) is designed to be a
comprehensive framework for recording descriptions of named geographic
places, including the core elements of toponyms (and their history), spatial
location (in various representations), and classification (according to
refe
renced typing schemes), and source attribution for pieces of description
gathered from various resources for a particular place. The intention is to
demonstrate the use of the GCS and promote its adoption and use so that
gazetteer data created by various l
ocal, national, and international agencies,
and by special knowledge groups, can be shared and, when gathered from
various sources, understood. The GCS is designed to meet the needs of
gazetteers containing current details of named geographic places and th
e
needs of gazetteers containing historical data. It is designed to support
international and multilingual applications. It is designed to link to other
sources of information about a particular place. As a comprehensive structure
for recording gazetteer d
escriptions, it can be considered to be an “archival”
structure. Implementations of it for gazetteer services will include additional
tables to support searching and report generation functions.


An underlying purpose is to direct attention to the componen
ts of description
for named geographic places and to inform future developments of
collections, database design, and services that link current and historical
toponyms (the names we give to geographic places) to mapable locations
(e.g., longitude & latitud
e coordinates) and that support the answering of
queries such as “What schools are in the Tucson area?” because a typing
scheme has been used to classify the entries.


A companion to the ADL GCS is the ADL Gazetteer Protocol
(
http://www.alexandria.ucsb.edu/gazetteer/protocol/
) that provides a standard
XML
-
based query and response structure for the machine
-
to
-
machine
querying of distributed gazetteers. The protocol and an open
-
source Java
-
bas
ed server implementation are available through the ADL web pages. The
protocol and the GCS are independent structures.

2. Overview

The GCS, version 3
.
2
, was developed as an XML schema. From this, a
relational database (rdb) logical model has been developed
. An
implementation
has been

developed

for the
PostgreSQL

database software,
with additional tables to support specific query matching and report
generation requirements.


Sections of the GCS deal with



Names and details of their origin, language, and use



C
lassification (typing according to a referenced scheme)



Codes associated with the place (e.g., FIPS code)



Spatial location (bounding box and detailed geometries)



Street address



Relationships to other named places



Data (e.g., population, elevation)



Descript
ion (narrative)



Links to external resources about the feature



Other: supplemental note; entry metadata


A separate, companion XML schema is used to describe the contributors and
their sources for pieces of data included in a gazetteer entry.


Views and fil
es of the GCS include the following:



HTML graphics of the XML schemas (.html files)

o

GCS 3.2

(large file


please wait for it to load completely)

o

Source 3.2



XML schemas (.xsd files)

o

GCS 3.2

o

Source 3.2



Sample records (.xml files)

o

GCS 3.2 required elements and attributes only

o

GCS 3.2 all elements and attributes

o

Source 3.2 required elements and attributes only

o

Source 3.2 all elements and attributes



Graphics of the relational database model are described
below
.


Time, attribution to source, and entry date are applicable throughout the GCS
to pieces of information gathered from multiple sources about
a particular
place. Time is treated in a similar fashion to spatial location. Time can be
represented as a time range (similar to the bounding box), as detailed time
instances and ranges (similar to the spatial geometries), and also as a named
time period.

The time period of the feature itself (e.g., for a school building
that no longer exists) as well as the time periods for names, spatial footprints,
data, and classification (e.g., a building changes its use from a church to a
school) can all be represent
ed. A general
temporal status

is part of the time
period representation, with
current
,
former
, and
proposed

as the three status
values.


Attribution to source and entry date are represented in the XML schema as
applicable to sections of the description; e.
g., for a particular placename, a
particular spatial footprint, a description, etc. In the rdb, this linking of source
to data has been extended to most of the attributes in the whole gazetteer
entry through the use of
mirror

tables where the source of eac
h bit of data
and its entry date can be represented.


The documentation of the source of pieces of information is structured as a
separate XML schema and is integrated into the rdb model as a discrete set
of tables with unique IDs for each distinct combina
tion of
contributor

and the
contributor’s
source of reference
. Linking a particular piece of information to a
contributor and source is done with these
source IDs
.


The core elements (required elements of description) of the GCS are a small
subset of the w
hole GCS. In the XML schema graphic, required elements
appear in solid
-
lined boxes. For the rdb, we have created specific
lite schema

views of the structure which can be used as a starting point.


3. History of development

The Alexandria Digital Library Pr
oject, which started in 1994, created the first
ADL Gazetteer early in the project. After a period of use and experimentation,
a formal structure was created for gazetteers


the first ADL Gazetteer
Content Standard


and the ADL Gazetteer was recreated us
ing a relational
database implementation based on the GCS. Revisions to the first GCS have
been ongoing as a result of consultations with other potential implementers.
In particular, the requirements of historical and multilingual gazetteers were
contribut
ed by member of the Electronic Cultural Atlas Initiative (ECAI) at
Berkeley. This version (3) is the result of intensive review of the structure
during the creation of the rdb logical model.


4. Core elements

A gazetteer record using only the required elem
ents of the GCS might look
like the following. Please note that the record is presented here in a report
format with customized element labels and without entry dates and attribution
to source. The encoded geometry section is presented in XML format to mak
e
the point that this section is represented by an externally referenced scheme.


feature ID:
12123434

feature status:
current

name:

Tucson (county seat)


primary display:
true

name status:

current


feature class:
populated places

primary display:
true

cla
ssification scheme:

name:
ADL Feature Type Thesaurus

version:

July 3, 2002

class status:

current

spatial location


planet:
Earth

bounding box:


geodetic basis:
WGS
-
84

west coordinate:
-
111.00278

east coordinate:

-
110,86778

south coordinate:

32.12278

north

coordinate:

32.26883

how generated:

calculated maximum and minimum extent of detailed
geometry

source geometry(ies):

primary geometry

geometry(ies):

primary geometry:
true

geometry status:

current

reference link to external geometry:

false

geometry coding

scheme:


name:
DLESE geospatial.xsd

version:

1






encoded geometry

(
example only
)
:

<
geospatialCoverages
>



<
geospatialCoverage
>




<
body
>





<
planet
>
Earth
</
planet
>




</
body
>


<
geodeticDatumGlobalOrHorz
>
DLESE:WGS84
</
geodeticDatumGlobalOrHorz
>




<
proje
ction

type
="
DLESE:Mercator
">
Information about the projection goes
here.
</
projection
>




<
coordinateSystem

type
="
DLESE:Geographic latitude and
longitude
">
Information about the coordinate system goes here
</
coordinateSystem
>




<
detGeos
>





<
detGeo
>






<
ty
peDetGeo
>Polygon</
typeDetGeo
>






<
geoNumPts
>
5
</
geoNumPts
>






<
geoPtOrder
>
Clockwise
</
geoPtOrder
>






<
longLats
>







<
longLat

longitude
="
-
110.5
"

latitude
="
32.26883
"/>







<
longLat

longitude
="
-
110.86778
"

latitude
="
32.15
"/>







<
longLat

longitude
="
-
110.6
"

latitude
="
32.12278
"/>







<
longLat

longitude
="
-
111.00278
"

latitude
="
32.186
"/>







<
longLat

longitude
="
-
110.75
"

latitude
="
32.2
"/>






</
longLats
>






<
detSrcIDandURL

URL
="
some URL
">
some
source
</
detSrcIDandURL
>






<
detSrcDesc
>
Generalized polyg
on derived from
shapefile
</
detSrcDesc
>






<
detAccEst
>
+/
-

5 mile perimeter
</
detAccEst
>






<
description
>
Extra information goes here about the detailed
geometry
</
description
>






<
detVert
>







<
geodeticDatumGlobalOrVert
>
DLESE:CGD28
-
CDN
</
geodeticDatumGl
obalOrVert
>







<
vertBase
>
Average sea level
</
vertBase
>







<
vertMin

units
="
feet (ft)
">
2410
</
vertMin
>







<
vertMax

units
="
feet (ft)
">
2410
</
vertMax
>







<
vertAcc
>
Generalized point elevation for Tucson
</
vertAcc
>






</
detVert
>





</
detGeo
>




</
detG
eos
>



</
geospatialCoverage
>


</
geospatialCoverages
>

entry date:
2000
-
07
-
01

modification date:

2001
-
05
-
15


In this example, the core gazetteer elements of the feature’s name, classification,
and spatial location are represented with some supporting informa
tion. This is all
that is required by the GCS. The full gazetteer entry for this same place could
include multiple placenames and details about each placename; multiple feature
classes, possibly from different classification schemes; multiple spatial
geome
tries from different sources or for different time periods; and much more.
For any particular gazetteer entry, a selection of the non
-
required elements can
be added.


Please note that some required elements can be treated as defaults; for
example,
planet =

Earth

and
status = current

(if the portion of historical
information is minimal)
.


Also note that the
encoded geometry

shown above is an example (not
complete) to show how an external geospatial description standard can be used
to represent the encoded g
eometries needed for the gazetteer description.


For l
inks to sample minimum and full XML records
, click
here
.

For views of schema and xml files, go to
views
.

For views of the relational database mod
el, go to
section 9
.


5. Treatment of geospatial description

Required
:



One detailed geometry (e.g., for a point, box, line, or polygon)



One bounding box representing the maximum and minimum extent of the

detailed geometry(ies)

Optional
:



Additional detailed geometries can be included. These additional
geometries may represent the location

o

in different ways (e.g., a point, a polygon), or

o

come from different sources, or

o

represent a change in the extent thr
ough time (e.g., for an urban
area)

Application:



To the feature (i.e., to the named geographic feature that is the focus of
the gazetteer entry)


The bounding box (aka minimum bounding rectangle) consists of the maximum
extent of the feature’s footprint on

the Earth’s surface in terms of longitude (east
and west) and latitude (north and south). It is required to support basic spatial
query matching operations. Separate coordinates for each side of the bounding
box (e.g., west coordinate) are used so that th
ere is no confusion when the box
extends across the 180
º meridian.


The specific elements of description for detailed geometries are not spelled out in
the GCS. Instead, the details of the geometries are to be expressed according to
a public geospatial rep
resentation standard, such as the Geography Markup
Language (OpenGIS), the FGDC’s Content Standard for Digital Geospatial
Metadata, or ISO’s TC 211 Geography Metadata standard. For the GCS, this is
an opaque description to be interpreted by the referenced
geospatial coding
standard.


The detailed geometry representation can be included in the gazetteer entry or it
can be held external to the gazetteer database and referenced through a URL. In
either case, the documentation about the format of the representa
tion must be
clear enough for correct computer interpretation.


Best practices for detailed geometries are that the following attributes be
included:



geodetic basis (e.g., WGS
-
84)



type of geometry (e.g., point, box, line, polygon, multi
-
polygon)



set of lon
gitude,latitude coordinate points with documented delimiters



statement of uncertainty in terms of a plus and minus value (e.g., +/
-

5
miles)



statement of uncertainty as a note


For views of schema and xml files, go to
views
.

For

views of the relational database model, go to
section 9
.


6. Treatment of temporal description


Required
:



Temporal status:
current, former, or proposed

Optional
:



Beginning and ending dates for a general

date range that spans the
known duration



Detail date descriptions that can include multiple representations of the
associated dates, documentation of the uncertainty of knowledge of the
dates, association with named time periods (e.g., the Middle Ages), a
nd
notes to explain unusual circumstances.

Application:



To the feature itself



To placenames



To spatial location



To classification (typing)



To relationships between named geographic places



To data associated with a named geographic place


In this version of

the GCS, the temporal aspects of a gazetteer entry have been
designed to mirror the treatment of the spatial aspects. In both cases, there is a
generalized representation (the bounding box and the time range) and detailed
representations. Beyond this basi
c common high
-
level structure, the treatment of
time is distinct because time applies to many aspects of a gazetteer entry and
because often the beginning and ending dates are not known, only that the time
in question is
current

or
former

(e.g., historical
) or, to make the set complete,
proposed

(e.g., a shopping center).


Also, for time there doesn’t seem to be an external standard for the
representation of time that covers the needs of the gazetteer. Therefore, a
descriptive structure for time representa
tion has been designed for the GCS. It
includes the
date range

as a generalized
temporal footprint,

the statement of
uncertainty for the detailed times, and the association of named time periods.


In anticipation that there will be web
-
accessible schemes,
like gazetteers, that
define named time periods in terms of date ranges, the structure for including
named time periods allows for linking to an external scheme as the source of the
named time period definition.


In the GCS and its associated relational da
tabase, the time component is
normalized and linked to other components. That is, the treatment of time is
consistent wherever it is used in the gazetteer entry.


Best practice is to add whatever dates are known to be associated with the
feature or one of
its descriptive aspects, even if the dates are not precise (e.g.,
only expressed to the decade or the century). This information will support some
degree of searching and display by date range.


For views of schema and xml files, go to
views
.

For views of the relational database model, go to
section
9
.

7. Links to external sources of information

Where there are data sources

that supplement the information included in the
gazetteer
,

the GCS provides elements that can be used to link to these

external
resources. This version

of the GCS provides the following linking elements (all
a
re optional and repeatable):



linkNameInfo
: link or reference

to further information about the name,
such
as
a scholarly document



geometryReferenceURL
: URL reference to a file that contains the
coordinate points or other representation of geographic location, such as a
grid representation, plus geodetic basis and geometry type (e.g., point,
line, polygon, etc.
). File needs to be self
-
explanatory



featureLink
: w
eb address and description of a site that provide
s
information about the feature; s
uch links are given a description, a
type/category, a language, and a URL.

8
. Treatment of attribution to source of data

A

basic tenet of the GCS is that there will be one gazetteer entry for a particular
named geographic location. That is, there will not be more than one entry for the
same place. Therefore, information about a place that comes from different
sources will be
merged into a single record. It is important that the source of the
different pieces of information be traceable back to a particular contributor and
reference source.


Source identification consists of two parts:



Contributor

o

Organization name and address;

optionally a contact point and a
website URL



Source reference

o

Bibliographic reference for the reference source; e.g., a map, a
book, etc.


Each ADL Gazetteer Source entry is uniquely identified with a mnemonic (e.g.
“USGS
-
GNIS
-
1”) and by a system
-
assigned

ID number. This ID number is
associated with individual pieces of data in a gazetteer entry.


In the rdb model, attribution to sources has been implemented through
mirror

tables. The result is that attribution can be associated with each row in each
colum
n of the main tables. This is an expansion from the basic attribution
included in the XML schema and provides a comprehensive solution for tracing
bits of information back to the contributor and reference source. The mirror tables
also include the entry da
te for each piece of information.


For views of schema and xml files, go to
views
.

For views of the relational database model, go to
section
9
.

9
. Relational database model

Graphi
cs showing parts of and the whole relational database model



GCS lite

o

All



GCS full

o

Main (full model)

o

Feature name

o

Feature location (geospatial)

o

Core feature attributes

(excludes name and location details)

o

Date/time

o

Source



Parallel (mirror) tables for source attribution and entry date

o

Part 1

o

Part 2



Spreadsheet holding column

definitions



Spreadsheet holding column descriptions


For views of schema and xml files, go to
views
.

10
. ADL implementa
tion

During the summer of 2003, the rdb logical model will be implemented as a DB2
database. Tables needed to support searching and report generation will be
added as needed. The existing ADL Gazetteer database will be converted to the
new schema and data
base model and the existing clients and services will be
moved to access the new database.


11
. Availability and contact points

Links to the schemas and the relational database model are elsewhere in this
document.


The primary contact point for further in
formation is Linda Hill,
lhill@alexandria.ucsb.edu
.


1
2
. Acknowledgements

The development of the ADL Gazetteer Content Standard and the
implementation of the ADL Gazetteer and its associated services have be
en
funded primarily by grants from the National Science Foundation through its
Digital Library Program. In addition, funds have been provided by NASA, ESRI,
and the Digital Library for Earth System Education (DLESE).


The ADL Gazetteer Development Team inc
ludes

Jim Frew

Jordan Hastings

Hav
å
r Valeur

Linda Hill

Greg Jan
é
e

David Valentine


Pilar Montes developed the relational database model on a contracting basis with
ADL.


Many have contributed to the design and contents of the GCS through their
feedback to
early versions. In particular, the Electronic Cultural Atlas Initiative
(ECAI) at Berkeley has given valuable advice in regard to support for historical
feature descriptions and multilingual text; Susan Stone has critiqued the
relational database model for

us and given us valuable feedback.


1
3
. References and links

ADL Gazetteer Development web page:
http://www.alexandria.ucsb.edu/gazetteer/


ADL Gazetteer Protocol:
http://www.alexandria.ucsb.edu/gazetteer/protocol/


ADL Gazetteer publications:
http://www.alexandria.ucsb.edu/gazetteer/#pubs


GCS schema


Relational database model