PostGIS Lecture

righteousgaggleData Management

Jan 31, 2013 (4 years and 9 months ago)

216 views

Databases, Spatial
Databases, and PostGIS

Overview of relational database
concepts and PostGIS

DBMS Perspective

From Spatial Databases, A Tour, by Shekhar and Chawla

Role of DBMS


Typical small system architecture:

Role of DBMS


Larger spatial DBMS

From “Introduction to Spatial Data Management with Postgis,” by
Arnulf Christl
http://www.ccgis.de
,
http://www.mapbender.org/

Definitions

Database Management Systems (DBMS)





“Software that controls the organization, storage, retrieval,
security and integrity of data in a database. It accepts requests
from the application and instructs the operating system to transfer
the appropriate data.”
(Computer Desktop Encyclopedia, 2007)




Key features:


Data Security


Data Integrity


Interactive Query


Interactive Data Manipulation


Data Independence



May be
Network, Hierarchical, Object, Relational
.
Relational

is by far the most commonly
-
used and well
-
established, and handles most data management problems very
well.

Definitions

Relational Database Management Systems (RDBMS)




“A database that maintains a set of separate, related files
(tables), but combines data elements from the files for
queries and reports when required. The concept was
developed in 1970 by Edgar Codd, whose objective was to
accommodate a user's ad hoc request for selected data.”
[See
Codd Article

for details.]
(Computer Desktop Encyclopedia,
2007)



Data stored in separate tables, each containing tabular data
like a spreadsheet, joined together as needed.



In the early days of RDBMS, many vendors claimed to offer
relational databases when they did not


Codd came up
with
12 rules
defining the requirements for a database
system to be truly relational.


Tables

Relational tables have these properties:




Column Values Are of the Same Kind



Each Row is Unique



The Sequence of Columns is not significant



The Sequence of Rows is not significant



Each column must have a unique name (within the table)



Represents a single
entity
. Example:


Columns

Table columns have these properties:



They have a data type (similar to variables, but slightly different)


char,
varchar, number, float, text, BLOB (image), etc.



Column values should be independent from each other.



Values may be required (not null), or nullable. (Some databases differentiate
between null and zero
-
length string).



Columns may be indexed to improve access speed.



A column may be used as the basis of the order of the physical data (clustered
index).



Each represents an independent
attribute
.

Primary Keys


The
primary key

is an attribute or a set of attributes that uniquely
identify a specific instance of an entity.



Every entity in the data model must have a primary key whose
values uniquely identify instances of the entity.



To qualify as a primary key for an entity, an attribute must have the
following properties:



it must have a non
-
null value for each instance of the entity.


the value must be unique for each instance of an entity.


the values must not change or become null during the life of each entity
instance






Composite and Artificial Keys


Sometimes more than one attribute is required to
uniquely identify an entity. A primary key that made up
of more than one attribute is known as a
composite
key
.




An
artificial key

is one that has no intrinsic meaning.
These can be very useful when no attribute has all the
primary key properties, or the meaning of the primary
key is otherwise complicated or conditional.




Foreign Keys


A
foreign key

is an attribute that defines a relationship
between two tables. Foreign keys provide a method for
maintaining integrity in the data (referential integrity). Every
relationship between entities must be supported by a foreign
key.



A foreign key is an attribute in one table and a primary key in
another.



Typically, an RDBMS will let you specify whether dependent
entities are deleted (cascade delete) when a parent is deleted,
or disallowed. This ensures the referential integrity of the
database.



Foreign keys make possible establishing relationships in an
Entity
-
Relationship Diagram [see sample below].






Primary and Foreign Keys


Example:

http://msdn.microsoft.com/en
-
us/library/ms171971.aspx


Indexes


Disk reads are expensive.


Finding a record in a large table can be slow,
often requiring a full table scan, that is, checking
every row in the table.


Indexes allow for much more efficient searching.


If the performance of a query is bad, most of
the time it can be fixed with better indexes.


Indexes tend to decrease query time, but
increase insert/delete time, since they need to
be adjusted (automatically) when the data
changes.

Indexes

From Spatial Databases, A Tour, by Shekhar and Chawla

Indexes


Example of an index minimizing disk
access:

Indexes


Spatial data doesn’t have a simple natural ordering like most
traditional data types.


In PostGIS, interacting with spatial indexes is similar to other data
types, but internally is handled differently using the Generalized
Search Tree algorithm.


This allows flexible partitioning based on the kind of data being
indexed, rather than on an existing alphanumeric sequence


PostGIS has implemented spatial partitioning algorithms for GIST
indexes.



http://gist.cs.berkeley.edu/gist1.html

Other Features of RDBMS


Triggers


Transaction Support


Stored Procedures


Views


User
-
Defined Functions


User
-
Defined Data Types


Extended features:


Full
-
text search


Spatial data


Replication


Others…

Normalization


The most elementary structure for a database is a single table, which
corresponds to a spreadsheet or attribute table.



In this model there is one kind of entity (consisting of rows) which has a
uniform set of attributes (consisting of columns).



This is fast and simple, but very limited.



Example of un
-
normalized data


NYC PLUTO tax lot data:



Normalization


While this might adequately capture the data we need, it has some
drawbacks:



There will be lots of redundant data if many parcels are added.
The borough and owner names, for example, will all be
replicated for each row where these values are the same. This
makes maintaining the data difficult.



What happens if a single parcel is zoned for more uses than
the number of columns allow? Or if one parcel has several
owners?



How do you know for sure which parcel is which?



Normalization


The solution is to begin normalizing the data. You can look at this as a three
-
step process
implementing each of Codd’s “Normal Forms”.



First normal form rules:



A row of data cannot contain repeating groups of similar data (atomicity); and



Each row of data must have a unique identifier or primary key.



Second normal form rules:



No attribute can be dependent on only a portion of the primary key.



Every column must depend only on the entire primary key; if it is dependent on one or
more other columns, these should be moved into new tables.



Third normal form:




No dependencies within non
-
key attributes.




Normalization


Normalization usually involves creating multiple tables,
each of which complies with the three normal forms.


A normalized version of the PLUTO data, for example,
would likely have separate zoning and owner tables.


However, in practice it is often useful to de
-
normalize the
database structure. The PLUTO data in its current form
is extremely usable, and de
-
normalizing it for usability
makes sense.


Typically reporting applications will use some form of de
-
normalization to improve query performance, while more
transactional applications will use normalized database
structures to minimize redundant updates.


SQL


Structured Query Language is the interface you use to communicate with an
RDBMS.



It consists of
Queries
;
DML



data manipulation language; and
DDL



data
definition language.



SQL is standardized, but different DB vendors have different flavors and
extensions (Oracle Spatial, for example, adds spatial keywords to SQL).



SQL is not really a full featured language like C#, although most database
vendors have SQL
-
based languages like PL/SQL or TSQL that let you embed
SQL statements directly in procedural code.



Typically, a program will interact with a database by submitting SQL statements,
one by one, to a database using a data access layer that sends the requests to
the database and returns the results to the program.



SQL

DDL statements
-

examples:



CREATE TABLE

-

creates a new database table



ALTER TABLE
-

changes a database table’s structure



DROP TABLE

-

deletes a database table




CREATE INDEX

-

creates an index



DROP INDEX

-

deletes an index


SQL

Queries:



SELECT



extracts data from a database table


DML statements:



UPDATE

-

updates data in a database table



DELETE

-

deletes data from a database table



INSERT
-

inserts new data into a database table


SQL

Return all the rows in a table:


SELECT * from TableName


Order by a field:


SELECT * from TableName

ORDER BY ColumnName


Add a where clause:


SELECT * from TableName

WHERE ColumnName < 5

ORDER BY ColumnName


Joining two tables
:


SELECT Table1.Column1, Table2.Column1

FROM Table1 INNER JOIN Table2

ON Table1.Id = Table2.Id


Others:


SELECT DISTINCT Column1 from TableName

SELECT COUNT(*) from TableName

see
this discussion

for others.




http://db.grussell.org/imp.html


Data Access Layer


Databases come with interactive SQL query interfaces.


Programmatic interfaces are provided by the DB vendors
as well as third party vendors.


There are many different standards and conventions
such as ODBC, ADO, ADO.NET, JDBC, OLE DB (to see
a list of Microsoft’s for example, see
http://msdn2.microsoft.com/en
-
us/library/ms810810.aspx
.

Data Access Layer


Typical application sequence:


Connect to database


Requires editable string with database
connection info; usually stored in
configuration file.


Requires exception handling in case
connection is unsuccessful!


Execute SQL


Iterate through results (if query)


Display/output data


Close connection

Data Access Layers


ADO.NET is commonly used in C#:



SQLServerConnection Conn = new SQLServerConnection("host=nc
-
star;port=1433; User
ID=test01;Password=test01;




Database Name=Test");


try


{



Conn.Open();


}


catch (SQLServerException ex)


{



Console.WriteLine(ex.Message); return;


}



try


{



string strSQL = "SELECT ename FROM emp WHERE sal>50000";



SQLServerCommand DBCmd = new SQLServerCommand(strSQL, Conn);



SQLServerDataReader myDataReader;



myDataReader = DBCmd.ExecuteReader();




while (myDataReader.Read())



{




Console.WriteLine("High salaries: " + myDataReader["ename"].ToString());



}



myDataReader.Close();



Conn.Close();


}


catch (Exception ex)


{



Console.WriteLine(ex.Message);



return;


}




Spatial Data


Spatial data can be represented using a relational database structure.









But this is complex and inefficient, requiring multiple tables to capture
topology.


Usually spatial data can best be captured by a binary column that contains
all of the geometry associated with a record.


A table in a spatial database will usually have a single geometry column of
a single type, although a generic Geometry column can support multiple
geometry types.

From Spatial Databases, A Tour,
by Shekhar and Chawla


Spatial Databases


Spatial databases are usually built on top of RDBMS.



Spatial requirements are more involved, including
referential integrity based on topological geometrical
relationships, not just foreign
-
key constraints.



OGC has a SQL spec for simple features storage very
similar to the simple features spec used for geo tools:
http://www.opengeospatial.org/standards/sfs



PostGIS complies with OGC spec.

Spatial Databases

Sample join with spatial columns:

From Spatial Databases, A Tour, by Shekhar and Chawla


Spatial Databases

Mechanics of a spatial query:

From Spatial Databases, A Tour, by Shekhar and Chawla


PostGIS


PostGIS is the spatial add
-
on for PostgreSQL


Compares well to others but is free!
(
www.postgresql.org
).


Based on relational structure with support for spatial
data types and spatial functions.


Can be used to serve up map layers.


Can also be used for typical database transactional
functions.


No intrinsic support for raster data, unlike ArcSDE or
Oracle Spatial.

PostGIS


RDBMS use database tables and columns to represent the database
structure itself.


PostGIS supports GIS data by adding additional metadata tables:




spatial_ref_sys



contains a definition for each available
coordinate system and projection. These can be added or edited
as needed.


geometry_columns



contains a definition of each geometry
column in the current database.



http://www.opengis.org/techno/interop/EPSG2WKT.TXT


http://www.opengis.org/techno/specs.htm



PostGIS


PostGIS geometry is based on the Open Geospatial
Consortium simple features specification:

http://www.opengeospatial.org/standards/sfs


PostGIS


Uses human
-
readable well
-
known text format to
represent geometry (converts to binary internally):


From “Introduction to Spatial Data Management with Postgis,” by
Arnulf Christl
http://www.ccgis.de
,
http://www.mapbender.org/

PostGIS


Sample PostGIS SQL
-

note special role of
AddGeometryColumn, which manages
geometry_columns table.

From “Introduction to Spatial Data Management with Postgis,” by
Arnulf Christl
http://www.ccgis.de
,
http://www.mapbender.org/

PostGIS


Examples of supported functions


Comparisons:


ST_Equals, ST_Disjoint, ST_Intersects, ST_Touches,
ST_Crosses, ST_Within, ST_Contains, and
ST_Overlaps.



Set operations:


ST_Intersection, ST_Difference, ST_Union,
ST_SymDifference, ST_Buffer, and ST_ConvexHull



Others:


AsText, GeometryFromText, Transform (re
-
project).


PostGIS


Non
-
spatial data can be loaded from the SQL interface using the Copy
command.


Spatial data can be loaded using a command
-
line utility that comes with the
database, or a GUI shape file loader.

From “Introduction to Spatial Data Management with Postgis,” by
Arnulf Christl
http://www.ccgis.de
,
http://www.mapbender.org/

PostGIS


Data can be exported from PostGIS using the query
interface.



Spatial data can be exported to a shape file using the
pgsql2shp command
-
line utility. The basic syntax is:




psql2shp [<options>] <database> <SQL query>



See
http://www.bostongis.com/postgis_quickguide_1_4.bqg


PostGIS


PGAdmin is easy
-
to
-
use admin interface for Postgresql
database. It includes a pane for DDL and an interactive
SQL query tool.

PostGIS


Sample application: NYC Solar Map


Vector data and raster tile metadata stored in
PostGIS database


Raster tiles stored on file system


Data accessed managed by C# web services


Web services communicate with web
application via JSON (Javascript object
notation), with GeoJSON extensions.


Client application uses Javascript to display
map layers and query C# web service