Multimedia Database Design Document

bossprettyingΔιαχείριση Δεδομένων

28 Νοε 2012 (πριν από 4 χρόνια και 6 μήνες)

377 εμφανίσεις

Design Document

for an

Object Relational Weather Database


Project Report for ICS 691
-
4

Advanced Data Management
















submitted to:

Joan C. Nordbotten


submitted by:

Bill Luoma








December 13, 2002

Weather DB

Design Document


Bill Luoma

December 13, 2002



2

1.0 Introduction

This document seeks to mo
del an object
-
relational database system (OR
-
DBMS) that would be used in a
small television station to create, store and retrieve weather reports. The people involved with the weather at
the television station are modeled as descendents of the Person
-
as
-
we
ather
-
TV
-
station
-
employee entity,
there being three subclasses: Newscaster, Editor, and Meteorologist. Each person
-
role is responsible for
some aspect of the weather; the Meteorologist
creates

weather
-
related media, the Editor
assembles

that
media, and the

Newscaster
presents

it. Media may also be
created

by an Organization, such as the National
Weather Service. The other primary entity types are Creator, WeatherReport and MultimediaObject.


Since the database is primarily used for storing large objects an
d weather
-
related statistics, the data itself
will remain fairly static once inserted. Insertions will take longer than in a heavily
-
transaction oriented
database, which is often more concerned with updates, because a number of full text indices will need
to be
updated whenever a new record is inserted into the database.


Discussion of issues particular to an OR
-
DBMS will take place in the backdrop of PostgreSQL, the
implementation database; comments about this particular database server will appear through
out the report,
along with more thorough treatments in Section 6 and Appendix C.

2.0 Representative Queries

The following English language queries will give the reader and overall impression of the capabilities and
usage potential of the proposed database
design. Presentation and discussion of these queries is given in
Section 5 below.

2.1 Geographic Image Search

What is the url, title, and date of the satellite images of the Hawaii
-
central
-
pacific region that contain a
tropical storm or hurricane over the
past 90 days as well as the first and last names of the editor(s) involved
with these images?

2.2 Wave Height History

What were the recorded wave heights at sunset beach during the winter of 1998?

2.3 Beach Warnings

What Hawaii beaches had jellyfish war
nings over the past year?

2.4 Forecast Accuracy

What was the difference between the average winter temperature at South Point and the forecasted
average?

Weather DB

Design Document


Bill Luoma

December 13, 2002



3

2.5 Video Retrieval

Retrieve the satellite videos for the central pacific over the past month that sho
w a shear line
.

3.0 Semantic Data Model

A semantic data model of the proposed database is present in Appendix A. The model is presented in the
Structural Semantic Model (SSM) format (Nordbotten, 2002). According to Nordbotten, this model "was
developed as

a teaching tool and is an extension and graphic simplification of the EER modeling tool
presented in Elmasri & Navathe (2000)." It should be noted that, as an empirical subject of this teaching
experiment, I rate the model highly. I found the model especi
ally helpful, once I restrained myself from
thinking about databases mostly in terms of tables. The SSM forces the implementer to attempt to apply
semantics to particular entities and to create about and assert the meaning between objects. While I applaud

the foray into semantics, the journey is not without peril; I don't especially recommend it for the beginner.
Computer science has literally done wonders (and, arguably, horrors) in the service of a rigorous and
unfailing attention to the concept of synta
x, while at the same time ignoring semantics. It is not surprising,
given the relative difficulty in expressing the semantic meaning between entities in the SSM diagram itself,
that it is extremely challenging to index so called "unstructured" multi
-
media
documents based on semantic
criteria, and even more challenging to try to retrieve them. The problem of indexibility seems critical to any
complete theory of information retrieval, yet often the problems seem insurmountably intractable
(Hellerstein, 2002)
and at best capable of only fuzzy resolution. One would not be wrong to say then that
this is an exciting and rapidly changing intellectual landscape.

4.0 Implementation Plan

The implementation plan of the weather multimedia database is divided into sectio
ns based on entity, with
subsections arranged according to the following (possibly empty) categories: overview, attributes, indexes,
triggers, and functions.

4.1. Person Entity

This table represents a Person in the Weather MDB, specifically someone with
a role of employee at a TV
station. A person can have any number of Addresses and Contacts (e.g. phone #'s, email addresses, etc).
Each of the aforementioned person elements is a composite, repeating type. Person serves as the parent class
of Editor, Newsc
aster and Meteorologist, and a "surrogate" or partial parent of Creator.

4.1.1 Attributes

Name

Datatype

Constraint

Index

Remark

person_id

integer

pk

attribute


job_type

integer



One Editor, Newscaster, Meteorologist

job_title

vchar(128)




first_name

vchar(128)




last_name

vchar(128)




addresses

address[5]



array, composite

contacts

contact[5]



array, composite

Weather DB

Design Document


Bill Luoma

December 13, 2002



4

4.2 Meteorologist Entity

A Meteorologist is a subclass of Person, with the added capability of creating multimedia objects. See
surroga
te key discussion under 4.4 below.

4.2.1 Attributes

Name

Datatype

Constraint

Index

Remark

creator_id

integer

surrogate key

attribute

Parent of category sub
-
entity Creator

4.3 Organization Entity

An Organization is a corporate, government or educational e
ntity that can create multimedia weather objects
such as the National Weather Service. An Organization would typically be distinct from the TV station and
their employees. See surrogate key discussion under 4.4 below.

4.3.1 Attributes

Name

Datatype

Constra
int

Index

Remark

org_id

integer

pk

attribute


org_name

vchar(128)




addresses

address[5]



array, composite

contacts

contact[5]



array, composite

creator_id

integer

surrogate key

attribute

parent of category sub
-
entity Creator

4.4. Creator Entity

A

Creator is a category sub
-
entity or "union" between a Person
-
Meteorologist employed by the TV Station
or an external Organization. According to Elmasri & Navathe (2000), this kind of relation, when the parent
entities do not share a key, is best represent
ed by adding surrogate keys to the parents as well as a type field
to the child, which acts as a foreign key.

4.4.1 Attributes

Name

Datatype

Constraint

Index

Remark

creator_id

integer

pk

attribute


creator_type

integer

fk (Organization | Meteorologist)

a
ttribute


4.5 WeatherReport Entity

A WeatherReport is a complex media object consisting of one or more text, image or video MediaObjects.
Typically, a WeatherReport is assembled by an employee
-
editor from existing MediaObjects and presented
by an employee
-
newscaster; these two actions are represented by the Assembles and Presents relations.
While a WeatherReport must exist as "assembled" in the database, it need not necessarily be "presented" by
a Newscaster.

4.5.1 Attributes

Name

Datatype

Constraint

Inde
x

Remark

report_id

integer

pk

attribute


title

vchar(1024)


term existence

use tsearch for full text search

title_index

txtidx



indexing type for tsearch USING GIST

Weather DB

Design Document


Bill Luoma

December 13, 2002



5

date

timestamp


attribute


description

text


term existence

use tsearch for full text

search

description_index

txtidx



indexing type for tsearch USING GIST

keywords

text


term existence

use tsearch for full text search

report_uri

vchar(1024)



possible external file reference

report_mime_type

vchar(128)


attribute


keyword_index

txti
dx



indexing type for tsearch USING GIST

4.5.2 Indices

When using the tsearch() procedure, one must create indexes on all types declared as "txtidx". Here is the
appropriate syntax for the WeatherReport table:


CREATE INDEX idx_title_t ON WeatherReport U
SING gist(title_index);

CREATE INDEX idx_descr_t ON WeatherReport USING gist(description_index);

CREATE INDEX idx_keys_t ON WeatherReport USING gist(keywords_index);


Refer to Section 6 below for more information about the tsearch() procedure.

4.5.3 Trigge
rs

To use the "tsearch" procedure, typically one would need to implement triggers for updating title_index,
description_index, keyword_index indices when new records were added or existing records modified:


CREATE TRIGGER title_update BEFORE UPDATE OR INS
ERT

ON WeatherReport

FOR EACH ROW

EXECUTE PROCEDURE tsearch(title_index, title);


CREATE TRIGGER descr_update BEFORE UPDATE OR INSERT

ON WeatherReport

FOR EACH ROW

EXECUTE PROCEDURE tsearch(description_index, description);


CREATE TRIGGER keyword_update
BEFORE UPDATE OR INSERT

ON WeatherReport

FOR EACH ROW

EXECUTE PROCEDURE tsearch(keyword_index, keywords);

4.5.4 Functions

PostgreSQL provides two operators ("##" and "@@") for querying attributes of type "txtidx"; the former
operator performs an expansion
, the latter uses the query text as is. See Section 6 and Appendix C for more
detailed discussions.

4.6 MediaObject Entity

A MediaObject is the "abstract" parent class of various weather
-
related text and image objects. This class is
"pure virtual" in that

its payload is maintained in a subclass, either a TextObject, ImageObject, or
Weather DB

Design Document


Bill Luoma

December 13, 2002



6

VideoObject; participation in one of its subclasses is total. A MediaObject is related to a WeatherReport by
the Consists relation, wherein a WeatherReport consists of at least
one MediaObject and MediaObjects may
exist in none, one or more WeatherReports.

4.6.1 Attributes

Name

Datatype

Constraint

Index

Remark

obj_id

integer

pk

attribute


title

vchar(1024)


term existence

use tsearch for full text search

title_index

txtidx



indexing type for tsearch USING GIST

description

text


term existence

use tsearch for full text search

description_index

txtidx



indexing type for tsearch USING GIST

keywords

text


term existence

use tsearch for full text search

keyword_index

txtidx



indexing type for tsearch USING GIST

create_date

timestamp


attribute


relative_date

timestamp


attribute

for forecasts or postdated totals (e.g.
precipitation for past 24 hrs)

object_uri

vchar(1024)



possible external file reference

object_mime_type

vchar(128)


attribute


location

Location



UDT, composite geographical

advisory_level

vchar(32)


attribute


4.6.2 Indices

The following indices will be created for the MediaObject in a manner similar to the WeatherReport; see
4.5.2 above for details.

4.6.3 Triggers

The following triggers will be created for the MediaObject in a manner similar to the WeatherReport; see
4.5.3 above for details.

4.6.4 Functions

The location field is of type Location, one field of which is a box type. Boxes can be indexed

and queried
with the geometric operators @, ##, etc. See Appendix C for more information.

4.7 TextObject

A TextObject is a direct subclass of MediaObject. The media payload may be split into header, body and/or
footer. If no such structure is desired, the

body field can simply be used with NULL header and footer
fields. A TextObject is not necessarily pure virtual entity; that is, participation in one of its more
-
structured
subclasses is partial. A TextObject is a less
-
structured type of report, such as a
tropical storm over open
water.

4.7.1 Attributes

Name

Datatype

Constraint

Index

Remark

header

text



use tsearch

header_index

txtidx


term existence

indexing type for tsearch USING GIST

body

text



use tsearch

Weather DB

Design Document


Bill Luoma

December 13, 2002



7

body_index

txtidx


term existence

indexing

type for tsearch USING GIST

footer

text



use tsearch

footer_index

txtidx


term existence

indexing type for tsearch USING GIST

4.7.2 Indices

The following indices will be created for the TextObject in a manner similar to the WeatherReport; see 4.5.2
ab
ove for details.

4.7.3 Triggers

The following triggers will be created for the TextObject in a manner similar to the WeatherReport; see
4.5.3 above for details.

4.7.4 Functions

In addition to using the tsearch functions for full text indexing, one can al
so use POSIX regular expressions
on the text fields header, body and footer. See Section 6 for more information.

4.8 MarineObject

A MarineObject is a direct subclass of TextObject, representing a report of sea conditions near a shore
location.


4.8.1 Attr
ibutes

Name

Datatype

Constraint

Index

Remark

water_temp

real


attribute


high_tide

real


attribute


high_tide_time

timestamp


attribute


low_tide

real


attribute


low_tide_time

timestamp


attribute


wave_height

real


attribute


4.9 AtmosphericObject

An AtmosphericObject is a direct subclass of TextObject, representing a report of air conditions near a land
or sea location.

4.9.1 Attributes

Name

Datatype

Constraint

Index

Remark

temp

real


attribute


sky_conditions

vchar(32)


attribute


wind_speed

r
eal


attribute

knots

wind_direction

vchar(32)


attribute


precipitation_amt

real


attribute


precipitation_type

vchar(32)


attribute


precipitation_interval

integer



hours: rain in last 24 hrs

Weather DB

Design Document


Bill Luoma

December 13, 2002



8

4.10 ImageObject

An ImageObject is a direct subclass of M
ediaObject, representing some weather
-
related image. Subclasses
are partial and disjoint, existing as a simple type field: Generic, Radar or Satellite. A Generic image is
similar to a generic TextObject; it is not an abstract type. A more fined
-
grained def
inition is possible in the
future for the Radar and Satellite types, in which case they would be given their own distinct tables, as for
the MarineObject and AtmosphericObject types.

4.10.1 Attributes

Name

Datatype

Constraint

Index

Remark

image_type

vchar
(32)


attribute

subclass: generic, radar, satellite

caption

text


term existence

use tsearch

caption_index

txtidx



indexing type for tsearch USING GIST

image_data

text



binary image data

4.10.2 Indices

The following index will be created for the Imag
eObject in a manner similar to the WeatherReport; see
4.5.2 above for details.

4.10.3 Triggers

The following trigger will be created for the ImageObject to update the full text index in a manner similar to
the WeatherReport; see 4.5.3 above for details.

4.11 VideoObject

A VideoObject is a direct subclass of MediaObject, representing some weather
-
related video. Subclasses are
partial and disjoint, existing as a simple type field: Generic, Radar or Satellite. A Generic video is similar to
a generic ImageObj
ect; it is not a “pure virtual” type.

4.11.1 Attributes

Name

Datatype

Constraint

Index

Remark

video_type

vchar(32)


attribute

subclass: generic, radar, satellite

video_speed

real


attribute

frames per second

video_size

point


attribute

optimum viewing
resolution

caption

text


term existence

use tsearch

caption_index

txtidx



indexing type for tsearch USING GIST

video_data

text



binary video data

4.11.2 Indices

The following index will be created for the VideoObject in a manner similar to the Weathe
rReport; see 4.5.2
above for details.

4.11.3 Triggers

The following trigger will be created for the VideoObject to update the full text index in a manner similar to
the WeatherReport; see 4.5.3 above for details.


Weather DB

Design Document


Bill Luoma

December 13, 2002



9

4.12 Presents Relation

The Presents rela
tion creates a relation between a Newscaster and a WeatherReport. A WeatherReport can
exist in the database without being presented by a Newscaster.

4.12.1 Attributes

Name

Datatype

Constraint

Index

Remark

newscaster_id

integer

pk, composite with report_id
;

fk, references Newscaster id;

attribute


report_id

integer

pk, composite with newscaster_id;

fk, references WeatherReport id;

attibute


date

timestamp


attribute


4.13 Assembles Relation

The Assembles relation creates a relation between an Editor and
a WeatherReport. A WeatherReport can not
exist without being assembled by an Editor.

4.13.1 Attributes

Name

Datatype

Constraint

Index

Remark

editor_id

integer

pk, composite with report_id;

fk, references Editor id

attribute


report_id

integer

pk, compos
ite with editor_id;

fk, references WeatherReport id

attribute


date

timestamp


attribute


4.14 Creates Relation

The Creates maps a Creator to a MediaObject. A MediaObject can not exist without a Creator.

4.14.1 Attributes

Name

Datatype

Constraint

Index

Remark

creator_id

integer

pk, composite with object_id;

fk, references Creator id

attribute


object_id

integer

pk, composite with creator_id;

fk, references MediaObject id

attribute


date

timestamp


attribute


4.15 ConsistsOf Relation

The ConsistsOf
Relation allows one or more MediaObjects to compose a WeatherReport.

4.15.1 Attributes

Name

Datatype

Constraint

Index

Remark

report_id

integer

pk, composite with object_id;

fk, references WeatherReport id

attribute


object_id

integer

pk, composite with
report_id;

fk, references MediaObject id

attribute


Weather DB

Design Document


Bill Luoma

December 13, 2002



10

date

timestamp


attribute


4.16 Location Entity

A Location UDT is used to map a MediaObject to a particular place. The box type allows a more flexible
way to describe area than a point, allowing large
or small coverage, and bounding
-
box overlap queries with
the built in geometric functions (see Appendix C).

4.16.1 Attributes

Name

Datatype

Constraint

Index

Remark

loc_id

integer

pk

attribute


loc_name

vchar(512)


attribute


loc_area

box


attribute

inde
x USING RTREE

4.17 Address Entity

An Address is a UDT used to endow people and organizations with addresses.

4.17.1

Name

Datatype

Constraint

Index

Remark

address_id

integer

pk

attribute


street

vchar(1024)




city

vchar(256)




district

vchar(256)




country

vchar(256)




postal_code

vchar(32)




4.18 Contact Entity

A Contact is a UDT that attempts to represent any kind of contact or telecom number assignable to an
individual or an organization; example contact_types include phone, email, fax, page
r, and cell.

4.18.1

Name

Datatype

Constraint

Index

Remark

contact_id

integer

pk

attribute


contact_type

vchar(16)




contact_value

vchar(256)




5.0 SQL Queries

5.1 Geographic Image Search

What is the url, title, and date of the satellite images of the

Hawaii
-
central
-
pacific region that contain a
tropical storm or hurricane over the past 90 days as well as the first and last names of the editor(s) involved
with these images?


SELECT I.obj_id, I.title, I.create_date, P.first_name, P.last_name

FROM Edito
r E, ImageObject I, WeatherReport R, ConsistsOf C, Assembles A

Weather DB

Design Document


Bill Luoma

December 13, 2002



11

WHERE I.obj_id = C.object_id

AND R.report_id = C.report_id

AND A.report_id = R.report_id

AND E.person_id = A.editor_id

AND (I.obj_id IN (SELECT obj_id FROM ImageObject IO

WHERE IO.image_type
= 'satellite'

AND IO.location.area @ box(170, 30, 150,10)

AND (days(now()
-

IO.create_date)) <= 90

AND (IO.title_index ## 'tropical storm'

OR IO.title_index ## 'hurricane'

OR IO.description_index ## 'tropical storm'

OR IO.description_index ## 'hurricane'
))

ORDER BY I.create_date;


This type of query is supported by joining the attributes in the Editor, WeatherReport and ImageObject
tables. Location information is supplied through the location attribute of the ImageObject entity. Honolulu
could be found by

location name or by passing in a box type with latitude and longitude in a where clause,
since the geometric type "box" is included in the Location entity and PostgreSQL has many built
-
in
functions and operators to do this kind of lookup (e.g. @, &&, etc)
. See Appendix C for more information
about geometric datatypes and operators.

5.2 Wave Height History

What were the recorded wave heights at sunset beach last year?


SELECT M.obj_id, M.location.name, M.title, M.create_date, M.wave_height

FROM MarineObj
ect M

WHERE M.location.name ~* 'sunset' OR M.location.name ~* 'pipeline'

ORDER BY M.wave_height;


This query is supported through the MarineObject table. Location information is supplied through the
location attribute of the parent MediaObject entity. Sun
set Beach could be found by location.name or by
passing in a box or point type representing the latitude/longitude of Sunset Beach in a where clause.

5.3 Beach Warnings

What Hawaii beaches had jellyfish warnings over the past year?


SELECT T.obj_id, T.loca
tion.name, T.title, T.description, T.create_date,

FROM TextObject T

WHERE T.obj_id IN (


SELECT obj_id FROM TextObject TO


AND TO.location.area ?# box(170, 30, 150,10)


AND (days(now()
-

TO.create_date)) <= 365


AND (TO.keyword_index ## 'jellyfish'


OR
TO.description_index ## 'jellyfish'


OR TO.header_index ## 'jellyfish'


OR TO.body_index ## 'jellyfish')

Weather DB

Design Document


Bill Luoma

December 13, 2002



12


)

ORDER BY I.create_date;


This query is supported via the WeatherReport and TextObject, which have text fields that could be queried
for words simila
r to "jelly fish". This query could be more quickly speeded up by selecting only those
MediaObjects whose advisory_level was "warning". Location could be found by the methods mentioned in
2.2 & 2.3 above.

5.4 Video Retrieval

Retrieve the radar videos for t
he central pacific over the past month that show a shear line
.


SELECT V.obj_id, V.title, V.create_date,

FROM VideoObject I, WeatherReport R, ConsistsOf C

WHERE V.obj_id = C.object_id

AND R.report_id = C.report_id

AND (V.obj_id IN (


SELECT obj_id FROM

VideoObject VO


WHERE VO.image_type = 'radar'


AND VO.location.area ?# box(170, 30, 150,10)


AND (days(now()
-

IO.create_date)) <= 30


AND (VO.title_index ## 'shear line'


OR VO.title_index ## 'shear line'


OR VO.description_index ## 'shear line'


OR V
O.description_index ## 'shear line')


)

ORDER BY I.create_date;


This type of query is supported via the VideoObject entity in the MediaObject Hierarchy. The MediaObject
has a location field, which can be looked up vi the name of the location or by passing

in a point or box
representing latitude(s)/longitude(s). The VideoObject entity caption, title and description fields are indexed
for fast search.

6.0 PostgreSQL OR
-
DBMS Support

PostgreSQL is an open source or
-
dbms, evolved from Michael Stonebraker's POS
TGRES95 research
database at UC Berkeley. The current version (7.3) has sufficient SQL
-
3 features for handling multi
-
media
data types, these include spatial queries supported by the "@" and "&&" operators discussed above, as well
as support for full
-
text
indexing and regular expressions. In addition, the database is extremely flexible in
terms of allowing the creation of user
-
defined types and functions, as well as allowing shared object (.so)
libraries to be dynamically linked into the database server. Th
ese shared object files generally contain more
elaborate user
-
defined data types, functions and indexing strategies.

Weather DB

Design Document


Bill Luoma

December 13, 2002



13

6.1 PostgreSQL Storage Types

6.1.1 Large Objects

PostgreSQL supports embedded storage, through the "text" type and the lo_import and lo_exp
ort functions,
the so
-
called Large Object interface. A type of "text" used to have a max length of 8912 bytes (prior to v
6.1). In later versions, that restriction is lifted. In addition, the Large Object C function lib allow random
access to the lob data
in terms of "chunks", which the "text" type (also referred to as "TOAST" in the
documentation) does not. Finally, using the Large Object interface is not secure, because all users of the db
share a table called pg_largeobject. See the Create Type discussio
n in Appendix C.

6.1.2 Geometric Types

PostrgeSQL supports man geometric types, operators and functions. As we have seen, the box type is native
to PostgreSQL and can be indexed with a standard attribute index. Most of the geometric function operators
(e.g
. @, &&,<
-
>) native to PostgreSQL can be used on this datatype to query overlap, intersection,
containment and/or distance. See Appendix C.2, C.6, C.7 for more information on PostgreSQL's geometric
functionality.

6.1.3 Array Types

PostgreSQL allows columns

of a table to be defined as variable
-
length multidimensional arrays. Arrays of
any built
-
in type or user
-
defined type can be created. An array data type is named by appending square
brackets [] to the data type name of the array elements. The initializati
on syntax is not unlike the syntax for
populating structures in C.


INSERT INTO employee


VALUES ('Bill',


'{10000, 10000, 10000, 10000}',


'{{"meeting", "lunch"}, {}}');


Refer to Appendix C.4 for more information about PostgreSQL arrays.

6.2 Po
stgreSQL Indices

6.2.1 Create Index Synopsis

PostgreSQL supports indexing through the use the of CREATE INDEX command. PostgreSQL allows
customization of the indexing strategy with the "USING" keyword. "USING" takes one of the following
access methods as
an argument:



BTREE (the default), works well for structured, one
-
dimensional data



RTREE, works well on spatial data



HASH, purported to work well for one
-
dimensional equality testing, but generally unused



GIST, a generalized tree ADT


GiST stands for Ge
neralized Search Tree (Hellerstein, 1995). To create a GiST index, one must implement
an access method interface consisting of four functions: Consistent(), Union(), Penalty(), PickSplit(). GiST
performs slightly slower than the hand
-
coded b
-
tree or r
-
tree

counterparts, presumably because GiST is an
Weather DB

Design Document


Bill Luoma

December 13, 2002



14

abstraction encompassing any tree type; one often pays for convenience of abstraction with the currency of
performance.

6.2.1 Full Text Index

There are two such contributions to PostgreSQL implementing GiST are
especially relevant to term
indexing:

1)

tsearch, a full
-
text indexing scheme

2)

OpenFTS, an open source Full Text Search project (Bartunov, 2001).


These contributions can be registered for use by informing the database server of the path to the shared
obje
ct files of these add
-
on libraries. I have chosen to use the tsearch mechanism rather than OpenFTS
merely because the former was included with the database distribution.

6.3 PostgreSQL Functions

6.3.1 Posix Regular Expressions

PostgreSQL provides a number
of significant extensions to SQL2, including regular expression binary
operators for efficiently searching large text fields. These include the following:



(lhs ~ rhs) to specify that an operand is equal to the regular expression search pattern that follows



(lhs ~* rhs) to specify an operand is case
-
insensitive equal to the regular expression search pattern
that follows



(lhs !~ lhs) to specify an operand is not equal to the regular expression search pattern that follows



(lhs !~* rhs) to specify that an opera
nd is not equal case
-
insensitive to the regular expression pattern
that follows.

The format of the PostgreSQL regular expression is similar to a typical POSIX (grep) style pattern.

6.3.2 TXTIDX Operation

PostgreSQL provides two operators ("##" and "@@") f
or querying attributes of type "txtidx"; the former
operator performs an expansion, the latter uses the query text as is. Expansion is performed according to a
default dictionary based on a simple "Lovin" stemmer, which is reported to use a longest match a
lgorithm.
PostgreSQL provides a perl script to create custom dictionaries for those so inclined. It should be noted that
the "tsearch" procedure will use the same dictionary to expand/stem the term index. An example is shown
below:


SELECT title FROM Weath
erReport

WHERE title_index ## 'hurricane' OR title_index ## 'tropical storm';


One index field of type txtidx is an "inverted file" for every row of a table for one unstructured field (say
"description"). For example, given a table like the following, Pos
tgreSQL would actually maintain only one
"file" for the descr_idx, which contained all the words in every "description" entry in the table, ordered in
some kind of generalized tree structure with pointers to their corresponding ids and descriptions.


id (
int)

title (text)

description (text)

descr_idx (txtidx)

body (text)

0001

storm

This was a tropical
storm

pointer to inverted
file singleton

longstring1

0002

wind

This was a kona
pointer to inverted
longstring 2

Weather DB

Design Document


Bill Luoma

December 13, 2002



15

wind pattern

file singleton

0003

wave

This

was a rogue
wave that killed a
swimmer off of
Port Lock

pointer to inverted
file singleton

longstring3

0004

water
temp

This is always 80
degrees F.

pointer to inverted
file singleton

longstring4


This scheme would certainly speed up retrieval because ev
ery description field would not have to be linear
searched for the target text; one tree structure would be searched. The actual implementation of
txtidx

is
opaque to the user; the source code, however, is available to the overly curious. One would guess t
hat this
type stores the text of the field it is meant to in a stemmed and reduced fashion along with the id of the row,
as well as possibly other values representing position within the document and frequency of occurrence.


Another possibility is to ind
ex all the fields of interest relative to a particular table inside a single inverted
file; that is, include the title, description, and body in one index field of type
txtidx

(this is legal according to
the documentation). However, one would loose a littl
e granularity on the query (e.g. did the query hit the
title or the description or the body?); one would need to store that information in a coupled object inside the
database and parse its type if one wanted an answer to this question.

6.4 PostgreSQL Opt
imizing Query Planner

6.4.1 Genetic planner

PostgreSQL uses a query planner, based on a genetic algorithm , to devise a query plan for each request. It
is possible to control the query planner to some extent by using explicit JOIN syntax. According to the
documentation, when the number of join tables is above a certain compile
-
time threshold, "the PostgreSQL
planner will switch from exhaustive search to a genetic probabilistic search through a limited number of
possibilities. The genetic search takes less t
ime, but it won't necessarily find the best possible plan." (PGDG,
2001)

6.4.2 Index Optimizations

As mentioned above, PostgreSQL provides several index types: B
-
tree, R
-
tree, GiST, and Hash. Each index
type is more appropriate for a particular query type
because of the algorithm it uses. By default, the
CREATE INDEX command will create a B
-
tree index, which fits the most common situations. In particular,
the PostgreSQL query optimizer will consider using a B
-
tree index whenever an indexed column is involve
d
in a comparison using one of these operators: <, <=, =, >=, >. The PostgreSQL query optimizer will
consider using an R
-
tree index whenever an indexed column is involved in a comparison using one of these
operators: <<, &<, &>, >>, @, ~=, &&. Note only th
e bounding
-
box operators such as "&&" can take
advantage of the GiST spatial index. Functions such as distance() cannot use the index to optimize their
operation. Refer to Appendix C.2 for more information about the meaning of these operators.

7.0 Referenc
es

Elmasri, R. and S.B. Navathe (2000).
Fundamentals of Database Systems
, 3rd ed. Addison Wesley.


Nordbotten, Joan C. (2002).
ADM
-

Advanced Data Management
, mss.
http://nordbotten.ifi.uib.no/ADM/ADM_text/ADM
-
frame.htm

Weather DB

Design Document


Bill Luoma

December 13, 2002



16


Hellerstein, M., Koutsoupias E., Mi
ranker D., Papadimitriou, C. and Samoladas, V. (2002). "On a Model of
Indexability and its Bounds for Range Queries." JACM 49(1),
http://gist.cs.berkeley.edu/gist1.html#Indexability.


Hellerstein, J., Naughton, J. and Pfeffer, A. (1995). "Generalized Sear
ch Trees for Database Systems."
Proc.
21st Int'l Conf. on Very Large Data Bases
, Zürich, September 1995, 562
-
573.


The PostgreSQL Global Development Group (2001).
PostgreSQL 7.2.1 Documentation
.
http://www.postgresql.org/idocs/index.php?index.html.


Bartun
ov, O., Sigaev, T., Wickstrom, D. (2001). Open Source Full Text Search Engine (OpenFTS).
http://openfts.sourceforge.net/primer.html.

Weather DB

Design Document


Bill Luoma

December 13, 2002



17

8.0 Appendices
Weather DB

Design Document


Bill Luoma

December 13, 2002



18

Appendix A
--

Structural Semantic Model


MediaObject
ImageObject
TextObject
WeatherReport
ConsistsOf
Creator
Creates
Editor
Assembles
Organization
Person
U
Newscaster
Presents
Meteorologist
(t,d)
(p)
(t,d)
(0,n)
(1,n)
(0,n)
(0,n)
(0,n)
(0,n)
(1,n)
(0,n)
SSM I
Entity Relationships
VideoObject
MediaObject
ImageObject
TextObject
WeatherReport
ConsistsOf
Creator
Creates
Editor
Assembles
Organization
Person
U
Newscaster
Presents
Meteorologist
(t,d)
(p)
(t,d)
(0,n)
(1,n)
(0,n)
(0,n)
(0,n)
(0,n)
(1,n)
(0,n)
SSM I
Entity Relationships
VideoObject

Weather DB

Design Document


Bill Luoma

December 13, 2002



19

MediaObject
TextObject
(t,d)
ImageObject
MarineObject
AtmosphericObject
(p,d)
--
obj
_id
<integer>
--
ti tle
<
vchar
(1024)>
--
ti tle_index
<
txtidx
>
--
description
<text>
--
description_index
<
txtidx
>
--
keywords
<text>
--
keyword_index
<
txtidx
>
--
create_date
<timestamp>
--
relative_date
<timestamp>
--
object_
uri
<
vchar
(1024)>
--
object_mime_type
<
vchar
(128)>
--
location
<Location>
--
advisory_level
<
vhar
(32)>
--
header
<text>
--
header_index
<
txtidx
>
--
body
<text>
--
body_index
<
txtidx
>
--
footer
<text>
--
footer_index
<
txtidx
>
--
image_type
<
vchar
(32)>
--
caption
<text>
--
caption_index
<
txtidx
>
--
image_data
<text>
--
water_temp
<real>
--
high_tide
<real>
--
high_tide_time
<timestamp>
--
low_tide
<real>
--
low_tide_time
<timestamp>
--
wave_height
<real>
--
temp
real
--
sky_conditions
vchar
(32)
--
wind_speed
real
--
wind_direction
vchar
(32)
--
precipitation_amt
real
--
precipitation_type
vchar
(32)
--
precipitation_interval
integer
SSM II
Multimedia
Attribute Detail
VideoObject
--
video_type
<
vchar
(32)>
--
video_speed
<real>
--
video_size
<box>
--
caption
<text>
--
caption_index
<
txtidx
>
--
video_data
<text>
MediaObject
TextObject
(t,d)
ImageObject
MarineObject
AtmosphericObject
(p,d)
--
obj
_id
<integer>
--
ti tle
<
vchar
(1024)>
--
ti tle_index
<
txtidx
>
--
description
<text>
--
description_index
<
txtidx
>
--
keywords
<text>
--
keyword_index
<
txtidx
>
--
create_date
<timestamp>
--
relative_date
<timestamp>
--
object_
uri
<
vchar
(1024)>
--
object_mime_type
<
vchar
(128)>
--
location
<Location>
--
advisory_level
<
vhar
(32)>
--
obj
_id
<integer>
--
ti tle
<
vchar
(1024)>
--
ti tle_index
<
txtidx
>
--
description
<text>
--
description_index
<
txtidx
>
--
keywords
<text>
--
keyword_index
<
txtidx
>
--
create_date
<timestamp>
--
relative_date
<timestamp>
--
object_
uri
<
vchar
(1024)>
--
object_mime_type
<
vchar
(128)>
--
location
<Location>
--
advisory_level
<
vhar
(32)>
--
header
<text>
--
header_index
<
txtidx
>
--
body
<text>
--
body_index
<
txtidx
>
--
footer
<text>
--
footer_index
<
txtidx
>
--
image_type
<
vchar
(32)>
--
caption
<text>
--
caption_index
<
txtidx
>
--
image_data
<text>
--
image_type
<
vchar
(32)>
--
caption
<text>
--
caption_index
<
txtidx
>
--
image_data
<text>
--
water_temp
<real>
--
high_tide
<real>
--
high_tide_time
<timestamp>
--
low_tide
<real>
--
low_tide_time
<timestamp>
--
wave_height
<real>
--
water_temp
<real>
--
high_tide
<real>
--
high_tide_time
<timestamp>
--
low_tide
<real>
--
low_tide_time
<timestamp>
--
wave_height
<real>
--
temp
real
--
sky_conditions
vchar
(32)
--
wind_speed
real
--
wind_direction
vchar
(32)
--
precipitation_amt
real
--
precipitation_type
vchar
(32)
--
precipitation_interval
integer
--
temp
real
--
sky_conditions
vchar
(32)
--
wind_speed
real
--
wind_direction
vchar
(32)
--
precipitation_amt
real
--
precipitation_type
vchar
(32)
--
precipitation_interval
integer
SSM II
Multimedia
Attribute Detail
VideoObject
--
video_type
<
vchar
(32)>
--
video_speed
<real>
--
video_size
<box>
--
caption
<text>
--
caption_index
<
txtidx
>
--
video_data
<text>
Weather DB

Design Document


Bill Luoma

December 13, 2002



20

Appendix B
--

Create Statements

The following sample create s
tatements will serve to show how PostgreSQL handles composite types,
multivalued types, functions and inheritance.

B.1 Multimedia Hierarchy


create table location_t (


loc_id


serial,


loc_name


varchar(512),


loc_area


box

);


create function loca
tion ( integer, varchar(512), box )


returns location_t as


'insert into location_t values ($1, $2, $3);



'select * from location_t where loc_id = $1;'



language 'sql';


create table media_object_t (


obj_id



serial,


title



varchar(1024),


title_ind
ex


txtidx,


description


text,


description_index

txtidx,


keywords


text,


keyword_index

txtidx,


create_date


timestamp,


relative_date


timestamp,


object_uri


varchar(1024),


object_mime_type

varchar(128),


location


location_t,


advisory_level


varch
ar(32)

);


create table text_object_t (


header



text,


header_index


txtidx,


body



text,


body_index


txtidx,


footer



text,


footer_index


txtidx,

) inherits media_object_t;


create table atmospheric_object_t (


temp



real,


sky_conditions

varchar(3
2),

Weather DB

Design Document


Bill Luoma

December 13, 2002



21


wind_speed


real,


wind_direction

varchar(8),


wind_description

varchar(32),


precipitation_amt

real,


precipitation_type

varchar(32),


precipitation_interval

integer

) inherits text_object_t;

B.2 Person Hierarchy


create table address_t (


address_id


integer,


street



varchar(1024),


city



varchar(256),


district



varchar(256),


country


varchar(256),


postal_code


varchar(32),

);


create function address ( integer, varchar(1024), varchar(256), varchar(256), varchar(256), varchar(32) )


returns ad
dress_t as


'insert into address_t values ($1, $2, $3, $4, $5, $6);


select * from address_t where address_id = $1;'


language 'sql';


create table contact_t (


contact_id


integer,


contact_type


varchar(16),


contact_value


varchar(256)

);


create func
tion contact ( integer, varchar(16), varchar(256) )


returns contact_t as


'insert into contact_t values ( $1, $2, $3);


select * from contact_t where contact_id = $1;'


language 'sql';


create table person_t (


person_id


serial,


job_type


integer,


job
_title


vchar(128),


first_name


vchar(128),


last_name


vchar(128),


address
es


address_t[5],


contacts


contact_t[5]

);

Weather DB

Design Document


Bill Luoma

December 13, 2002



22


create table meteorologist_t (


creator_id


integer

) inherits person_t;


B.3 Creator Hierachy


create table organization_t (


org_i
d



serial,


org_name


varchar(128),


addresses


address[5],


contacts


contact[5],


creator_id


integer

);


create table creator_t (


creator_id


serial,


creator_type


integer

);


Weather DB

Design Document


Bill Luoma

December 13, 2002



23

Appendix C

Object Relational and Multimedia Syntax Features of PostgreSQL

This information has been culled from the PostgreSQL 7.2.1 Documentation (2001).

C.1 PostgreSQL Geometric Types

Geometric Type

Storage

Representation

Description

point

16 bytes

(x,y)

Point in space

line

32 bytes

((x1,y1),(x2,y2))

Infinite line

lseg

32
bytes

((x1,y1),(x2,y2))

Finite line segment

box

32 bytes

((x1,y1),(x2,y2))

Rectangular box

path

4+32n bytes

((x1,y1),...)

Closed path (similar to polygon)

path

4+32n bytes

[(x1,y1),...]

Open path

polygon

4+32n bytes

((x1,y1),...)

Polygon (similar to cl
osed path)

circle

24 bytes

<(x,y),r>

Circle (center and radius)


C.2 PostgreSQL Geometric Operators

Operator

Description

Usage

+

Translation

box '((0,0),(1,1))' + point '(2.0,0)'

-


Translation

box '((0,0),(1,1))'
-

point '(2.0,0)'

*

Scaling/rotatio
n

box '((0,0),(1,1))' * point '(2.0,0)'

/

Scaling/rotation

box '((0,0),(2,2))' / point '(2.0,0)'

#

Intersection

'((1,
-
1),(
-
1,1))' # '((1,1),(
-
1,
-
1))'

#

Number of points in polygon

# '((1,0),(0,1),(
-
1,0))'

##

Point of closest proximity

point '(0,0)'

## lseg '((2,0),(0,2))'

&&

Overlaps?

box '((0,0),(1,1))' && box '((0,0),(2,2))'

&<

Overlaps to left?

box '((0,0),(1,1))' &< box '((0,0),(2,2))'

&>

Overlaps to right?

box '((0,0),(3,3))' &> box '((0,0),(2,2))'

<
-
>

Distance between

circle '((0,0),1)
' <
-
> circle '((5,0),1)'

<<

Left of?

circle '((0,0),1)' << circle '((5,0),1)'

<^

Is below?

circle '((0,0),1)' <^ circle '((0,5),1)'

>>

Is right of?

circle '((5,0),1)' >> circle '((0,0),1)'

>^

Is above?

circle '((0,5),1)' >^ circle '((0,0),1)'

?#

Intersects or overlaps

lseg '((
-
1,0),(1,0))' ?# box '((
-
2,
-
2),(2,2))'

?
-


Is horizontal?

point '(1,0)' ?
-

point '(0,0)'

?
-
|

Is perpendicular?

lseg '((0,0),(0,1))' ?
-
| lseg '((0,0),(1,0))'

@
-
@

Length or circumference

@
-
@ path '((0,0),(1,0))'

?|

Is ve
rtical?

point '(0,1)' ?| point '(0,0)'

?||

Is parallel?

lseg '((
-
1,0),(1,0))' ?|| lseg '((
-
1,2),(1,2))'

@

Contained or on

point '(1,1)' @ circle '((0,0),2)'

@@

Center of

@@ circle '((0,0),10)'

Weather DB

Design Document


Bill Luoma

December 13, 2002



24

~=

Same as

polygon '((0,0),(1,1))' ~= polygon '((1,1),(
0,0))'


C.3 PostgreSQL Geometric Functions

Function

Returns

Description

Example

area(object)

double precision

area of item

area(box '((0,0),(1,1))')

box(box, box)

box

intersection box

box(box '((0,0),(1,1))',box '((0.5,0.5),(2,2))')

center(object)

poin
t

center of item

center(box '((0,0),(1,2))')

diameter(circle)

double precision

diameter of circle

diameter(circle '((0,0),2.0)')

height(box)

double precision

vertical size of box

height(box '((0,0),(1,1))')

isclosed(path)

boolean

a closed path?

isclosed
(path '((0,0),(1,1),(2,0))')

isopen(path)

boolean

an open path?

isopen(path '[(0,0),(1,1),(2,0)]')

length(object)

double precision

length of item

length(path '((
-
1,0),(1,0))')

pclose(path)

path

convert path to closed

popen(path '[(0,0),(1,1),(2,0)]')

n
point(path)

integer

number of points

npoints(path '[(0,0),(1,1),(2,0)]')

popen(path)

path

convert path to open path

popen(path '((0,0),(1,1),(2,0))')

radius(circle)

double precision

radius of circle

radius(circle '((0,0),2.0)')

width(box)

double precisi
on

horizontal size

width(box '((0,0),(1,1))')


C.4 PostrgreSQL Create Type Synopsis

CREATE TYPE typename ( INPUT = input_function, OUTPUT = output_function


, INTERNALLENGTH = { internallength | VARIABLE }


[ , EXTERNALLENGTH = { externallength |
VARIABLE } ]


[ , DEFAULT = default ]


[ , ELEMENT = element ] [ , DELIMITER = delimiter ]


[ , SEND = send_function ] [ , RECEIVE = receive_function ]


[ , PASSEDBYVALUE ]


[ , ALIGNMENT = alignment ]


[ , STORAGE = storage ]

)


CREATE T
YPE allows the user to register a new user data type with PostgreSQL for use in the current data
base. The user who defines a type becomes its owner. typename is the name of the new type and must be
unique within the types defined for this database. This
CREATE TYPE command is a PostgreSQL
extension. There is a CREATE TYPE statement in SQL99 that is rather different in detail.


CREATE TYPE requires the registration of two functions (using CREATE FUNCTION) before defining
the type. The representation of a
new base type is determined by input_function, which converts the type's
external representation to an internal representation usable by the operators and functions defined for the
type. Naturally, output_function performs the reverse transformation. The i
nput function may be declared as
taking one argument of type opaque, or as taking three arguments of types opaque, OID, int4. (The first
argument is the input text as a C string, the second argument is the element type in case this is an array type,
and th
e third is the typmod of the destination column, if known.) The output function may be declared as
taking one argument of type opaque, or as taking two arguments of types opaque, OID. (The first argument
Weather DB

Design Document


Bill Luoma

December 13, 2002



25

is actually of the data type itself, but since the o
utput function must be declared first, it's easier to declare it
as accepting type opaque. The second argument is again the array element type for array types.)


This example creates the box data type and then uses the type in a table definition:


CREATE

TYPE box (INTERNALLENGTH = 16,


INPUT = my_procedure_1, OUTPUT = my_procedure_2);

CREATE TABLE myboxes (id INT4, description box);


If box's internal structure were an array of four float4s, we might instead say


CREATE TYPE box (INTERNALLENGTH = 16,


INPUT = my_procedure_1, OUTPUT = my_procedure_2,


ELEMENT = float4);

which would allow a box value's component floats to be accessed by subscripting.


This example creates a large object type and uses it in a table definition:


CREATE TYPE bigobj
(INPUT = lo_filein, OUTPUT = lo_fileout,


INTERNALLENGTH = VARIABLE);

CREATE TABLE big_objs (id int4, obj bigobj);

Arrays

Whenever a user
-
defined data type is created, PostgreSQL automatically creates an associated array type,
whose name consists of the

base type's name prepended with an underscore. The parser understands this
naming convention, and translates requests for columns of type foo[] into requests for type _foo. The
implicitly
-
created array type is variable length and uses the built
-
in input a
nd output functions array_in and
array_out.


You might reasonably ask "why is there an ELEMENT option, if the system makes the correct array type
automatically?" The only case where it's useful to use ELEMENT is when you are making a fixed
-
length
type tha
t happens to be internally an array of N identical things, and you want to allow the N things to be
accessed directly by subscripting, in addition to whatever operations you plan to provide for the type as a
whole. For example, type name allows its constit
uent chars to be accessed this way. A 2
-
D point type could
allow its two component floats to be accessed like point[0] and point[1]. Note that this facility only works
for fixed
-
length types whose internal form is exactly a sequence of N identical fields.
A subscriptable
variable
-
length type must have the generalized internal representation used by array_in and array_out. For
historical reasons (i.e., this is clearly wrong but it's far too late to change it), subscripting of fixed
-
length
array types starts
from zero, rather than from one as for variable
-
length arrays.

C.5 PostgreSQL Create Function Synopsis

A CREATE FUNCTION command is defined in SQL99. The PostgreSQL version is similar but not
compatible. The attributes are not portable, neither are the dif
ferent available languages.


CREATE [ OR REPLACE ] FUNCTION name ( [ argtype [, ...] ] )


RETURNS rettype

Weather DB

Design Document


Bill Luoma

December 13, 2002



26


AS 'definition'


LANGUAGE langname


[ WITH ( attribute [, ...] ) ]

CREATE [ OR REPLACE ] FUNCTION name ( [ argtype [, ...] ] )


RETUR
NS rettype


AS 'obj_file', 'link_symbol'


LANGUAGE langname


[ WITH ( attribute [, ...] ) ]


This example creates a function that does type conversion between the user
-
defined type complex, and the
internal type point. The function is implemented
by a dynamically loaded object that was compiled from C
source (we illustrate the now
-
deprecated alternative of specifying the exact pathname to the shared object
file). For PostgreSQL to find a type conversion function automatically, the SQL function has
to have the
same name as the return type, and so overloading is unavoidable. The function name is overloaded by using
the second form of the AS clause in the SQL definition:


CREATE FUNCTION point(complex) RETURNS point


AS '/home/bernie/pgsql/lib/comp
lex.so', 'complex_to_point'


LANGUAGE C;

The C declaration of the function could be:


Point * complex_to_point (Complex *z)

{


Point *p;



p = (Point *) palloc(sizeof(Point));


p
-
>x = z
-
>x;


p
-
>y = z
-
>y;





return p;

}

C.6 PostgreSQL Create Table Syno
psis

CREATE [ [ LOCAL ] { TEMPORARY | TEMP } ] TABLE table_name (


{ column_name data_type [ DEFAULT default_expr ] [ column_constraint [, ... ] ]


| table_constraint } [, ... ]

)

[ INHERITS ( parent_table [, ... ] ) ]

[ WITH OIDS | WITHOUT OIDS ]


CREATE TABLE will create a new, initially empty table in the current database. The table will be owned
by the user issuing the command. CREATE TABLE also automatically creates a data type that represents
the tuple type (structure type) corresponding to on
e row of the table. Therefore, tables cannot have the same
name as any existing data type.


Weather DB

Design Document


Bill Luoma

December 13, 2002



27

Multiple inheritance via the INHERITS clause is a PostgreSQL language extension. SQL99 (but not
SQL92) defines single inheritance using a different syntax and diff
erent semantics. SQL99
-
style inheritance
is not yet supported by PostgreSQL


The optional INHERITS clause specifies a list of tables from which the new table automatically inherits all
columns. If the same column name exists in more than one parent table,
an error is reported unless the data
types of the columns match in each of the parent tables. If there is no conflict, then the duplicate columns
are merged to form a single column in the new table. If the column name list of the new table contains a
colum
n that is also inherited, the data type must likewise match the inherited column(s), and the column
definitions are merged into one. However, inherited and new column declarations of the same name need
not specify identical constraints: all constraints pro
vided from any declaration are merged together and all
are applied to the new table. If the new table explicitly specifies a default value for the column, this default
overrides any defaults from inherited declarations of the column. Otherwise, any parents

that specify default
values for the column must all specify the same default, or an error will be reported. Unique constraints and
primary keys are not inherited in the current implementation. This makes the combination of inheritance and
unique constrain
ts rather dysfunctional.

C.7 PostgreSQL Create Index Synopsis

CREATE [ UNIQUE ] INDEX index_name ON table


[ USING acc_method ] ( column [ ops_name ] [, ...] )


[ WHERE predicate ]


CREATE INDEX constructs an index index_name on the specified table.

Indexes are primarily used to
enhance database performance. But inappropriate use will result in slower performance. The key field(s) for
the index are specified as column names. Multiple fields can be specified if the index access method
supports multico
lumn indexes.


PostgreSQL provides B
-
tree, R
-
tree, hash, and GiST access methods for indexes. The B
-
tree access method
is an implementation of Lehman
-
Yao high
-
concurrency B
-
trees. The R
-
tree access method implements
standard R
-
trees using Guttman's quadra
tic split algorithm. The hash access method is an implementation of
Litwin's linear hashing.

C.8 Geometric Queries

Using PostgreSQL effectively requires knowing what spatial functions are available, and ensuring that
appropriate indexes are in place to pro
vide good performance. When constructing a query it is important to
remember that only the bounding
-
box operators such as "&&" can take advatage of the GiST spatial index.
Functions such as distance() cannot use the index to optimize their operation. For e
xample, the following
query would be quite slow on a large table:


SELECT name, date, city_boundary FROM city_table

WHERE distance(city_boundary, point(435.5, 564.45) < 1000;



This query is selecting the city areas of the city_table which are within 1000

of the point(435.5, 564.45). It
will be slow because it is calculating the distance between each point in the table and our specified point, i.e.
one distance() calculation for each row in the table. We can avoid this by using the "&&" operator to reduce
the number of distance calculations required:

Weather DB

Design Document


Bill Luoma

December 13, 2002



28


SELECT name, date, city_boundary FROM city_table

WHERE city_boundary && box(80750, 40780, 160700, 240100)


AND distance(city_boundary, point(435.5, 564.45) < 1000;


This query selects the same geometries, but

it does it in a more efficient way. Assuming there is a GiST
index on the city_boundary attribute, the query planner will recognize that it can use the index to reduce the
number of rows before calculating the result of the distance function.