Spatial Postgresql: Using PostGIS - Michigan State University

arizonahoopleData Management

Nov 28, 2012 (4 years and 8 months ago)

281 views

Page |
1



GEO 425: SPRING 2012

LAB 11: Spatial Postgresql: Using PostGIS


Objectives
: In the past two database labs you worked on SQL queries. These ar
e useful skills for spatial and
non
-
spatial data alike. This lab specializes on the spatial functionality of
Postgresql, which is enabled by an add
-
on called PostGIS. This exercise covers:


1. PostGIS spatial data types


2. Importing and exporting shapefiles


3. Spatial queries


4. Spatial functions


About PostGIS:

PostGIS is not really a complete GIS

-

it has no special graphic or cartographic functions. It is an open source
library that provides Postgresql with vector spatial data types and functionality equivalent to a basic vector GIS.
Along with Postgresql it serves as the back end of a variety of

GIS implementations and web map servers. Learn
more or download PostGIS at:
http://postgis.refractions.net/



Instructions


Step 1: Download and Start up

1. Create a directory called lab
11

and save the zipped shapefile from the web page to this directory. Fire up a
terminal window and

cd


to the directory. List its contents. To unzip the compressed shapefile, type:

unzip lake_points_200403.zip

This is point data about every lake larger tha
n 10 acres in Michigan, downloaded from MCGI's website. Load it
in Quantum GIS and have a look.


2. Start the postgresql interpreter using the

psql


command from last time. Use

\
d


to remind yourself about
the tables in your database. geometry_columns an
d spatial_ref_sys are automatically
-
created tables to spatially
enable postgresql; we'll use them some in this lab.


Step 2: Creating and manually populating a spatial table

1. Begin by creating a table that has a name field and a geometry field


point ge
ometry:

CREATE TABLE waypoints (name VARCHAR, pt GEOMETRY);


2. Now put some data in there.
Like these
GPS waypoints:

INSERT INTO waypoints VALUES('Meadow', 'POINT(706468


4735062)');

INSERT INTO waypoints VALUES('Heimat', 'POINT(706491

4734853)');

INSERT INTO waypoints VALUES('Marble', 'POINT(707638

4735153)');

INSERT INTO waypoints VALUES('Geo', 'POINT(706888

4733845)');


Step 3: Basic spatial queries

1. Simply querying your new table is surprising:

SELECT * FROM waypoints;


That's because the points type is not text format. Fortunately there's a converter function, ST_asText:

SELECT name, ST_asText(pt) FROM waypoints;

Use this function whenever you want to print spatial fields. Otherwise your queries won't be pretty!




Page |
2



2. He
re is a more advanced query. This uses a function to calculate distance between each point in the table
and another point:

SELECT name, ST_asText(pt), ST_Distance(pt, 'POINT(706491 4734853)') FROM waypoints;


Step 4: Importing a shapefile

1. Manually enter
ing the hundreds or thousands of features in a typical GIS dataset would be painful if not
impossible. Fortunately PostGIS comes with the ability to import and export spatial data files. These files are
external to the Postgresql interpreter, however. So,
start by quitting the interpreter with

\
q
”.


2. Type

ls


to make sure your shapefile is there. The relevant files should be called:

Lake_points_200403.shp,
.
shx,
.
dbf, and so on.


You are going to run a command that will convert the data in the shapefile

into a text sql file. This command is

shp2pgsql

. You will pipe the results of this command to a new file, called

mi_lakes.sql


(you could call it
mickey.mouse if you wanted, but this name is more intuitive). From the Unix prompt:

shp2pgsql
-
s 3078 Lake
_points_200403 lakes > mi_lakes.sql


The
-
s argument we passed is 3078. This is the projection code, called the SRID, for the coordinate reference
system employed by this data. 3078 is the code for Michigan GeoRef. PostGIS knows many hundreds of
reference
systems, each with their own code. 'lakes' is what we want to call the table when we construct it in
Postgresql. The > key 'pipes', or transfers, the result to a text file, called mi_lakes.sql


Note
: This did not actually create the table in Postgres; it c
reated a file that can create a table called 'lakes' in
Postgres! This file is a lot like census.sql and election.sql that you used in an earlier lab.


At the Linux prompt, type:

more mi_lakes.sql

You will see that it is a text file full of SQL statements. Type

q


to exit back to the Linux prompt.


3. Actually import the sql file with this Unix command:

psql
-
d [your db name]
-
f mi_lakes.sql

Most of that line should look familiar. The
-
f key specif
ies that you want to pass psql a file rather than starting
the interpreter. The result will be as if you'd typed a whole bunch of SQL statements at the interpreter. You will
see a lot of output before the prompt returns.


Step 5: Querying the spatial data

1. Start the postgresql interpreter & list tables with

\
d

. Do you see lakes? Type

\
d lakes


to list the fields.


2. Do a simple query; find all the lakes in Ingham County:

SELECT * FROM lakes WHERE county='Ingham';


Read across; note that this shapefile

thoughtfully stored coords in a couple of coordinate systems
-

these are
attributes. The last column has the actual coordinates stored in the shapefile. This column is mysteriously titled
'the_geom'. And it is
not meaningful ascii text. Hit “q” to quit fr
om a “more”

listing. Let's print some fields from
the table more meaningfully:

SELECT lake_name, x_coor_m, y_coor_m, ST_asText(the_geom) FROM lakes WHERE county='Ingham';


3. Hopefully coordinates in the_geom line match the others! What is the coordinate s
ystem?
________________ (check the html that came with the original shapefile).


Page |
3



4. So, what does Postgres think the SRID is? Well, it's stored in the_geom:

SELECT SRID(the_geom) FROM lakes;

It's a little annoying to have to hit 'q' every time on this larg
e table. How about this:

SELECT DISTINCT(SRID(the_geom)) FROM lakes;


Step 6: More Spatial and Nonspatial Queries

1. There must be a lot of lakes in Michigan over ten acres. How many? This query will tell you:

SELECT COUNT(lake_name) FROM lakes;


2. What a
bout the number of lakes per county? To do this, use the GROUP BY modifier:

SELECT COUNT(lake_name) FROM lakes GROUP BY county;


And you can put those in order with a better name:

SELECT COUNT(lake_name) AS lake_total FROM lakes

GROUP BY county ORDER BY
lake_total DESC;


However, the query doesn't return county names, which would be useful. Note that to make complex queries
we start with a simpler one, make it work, and then extend it to do more complicated stuff. That's a good
strategy in general:

SELECT

county, COUNT(lake_name) AS lake_total

FROM lakes GROUP BY county ORDER BY lake_total DESC;


Finally, can we return simply the county with the maximum number of lakes? Yes, by limiting the return to a
single record:

SELECT county, COUNT(lake_name) AS lake
_total FROM lakes GROUP BY county

ORDER BY lake_total DESC LIMIT 1;


You could return the 10 counties with the most lakes by putting a 10 at the end of the expression instead of a 1.
What is the county with the 10
th

most lakes? ________________


3. Special

functions are required to extract values from the geometry of your spatial table. Here's how to get all
the x and y coordinates (these are point features, not polygon):

SELECT lake_name, ST_X(the_geom), ST_Y(the_geom) FROM lakes;


4. The typical modifier
operations work on these. For example, to calculate the average x and y:

SELECT AVG(ST_X(the_geom)) AS avg_X, AVG(ST_Y(the_geom)) AS avg_Y FROM lakes;


5. Calculating distance between two features in the same table is tricky. To do it, you need to create t
wo single
-
row tables from lakes using a couple of SELECT commands. Here is the query; read it carefully:

SELECT ST_Distance(lk1.the_geom, lk2.the_geom)

FROM (SELECT * FROM lakes WHERE lake_name='Schaawe Lake') AS lk1,

(SELECT * FROM lakes WHERE lake_name='
Hartland Millpond') AS lk2;


How could you convert the result from meters to kilometers in the query?

(Hint: look back at your previous SQL labs.)



Page |
4



Step 7: Working with multiple spatial datasets

1.
Dr. Shortridge

downloaded statewide data on wastewater discharge permits from the EPA at:


http://www.epa.gov/enviro/html/pcs/adhoc.html


After much preprocessing, the data were brought into a PostGIS
database and saved as a text SQL file called
epa_permits.sql. Download it from the web page, save it in your lab
11

directory, and unzip it. Coordinates are
lat
-
lon; what's a good reference ID? SRID 4326 seems ok
-

this code is for lat lon, datum is WGS84,
which is as
good as any guess.
This was figured
out with some Googling, but you can also look at the spatial reference IDs
and definitions in postgresql: they are stored in a table right in your database:

SELECT srid, srtext, proj4text FROM spatial_ref_sys
;

but it probably looks pretty confusing!

Here's a command that finds all rows with a Michigan
-
specific reference system:

SELECT srid, srtext, proj4text FROM spatial_ref_sys WHERE srtext ILIKE '%Michigan%';


That will include a variety of Michigan state pl
ane options as well as Michigan GeoRef.


2. From the Unix prompt, run the following command to import the sql file:

psql
-
d [your db name]
-
f epa_permits.sql


3. Back in psql, check to make sure you have a new table called

pca

. List the columns with

\
d
pca

. Is the SRID
right? Is it the same as lakes?

SELECT DISTINCT(SRID(the_geom)) FROM pca;

SELECT DISTINCT(SRID(the_geom)) FROM lakes;


It's not. That means if we want to do queries that involve both data sets, we will have to transform one to the
other.
We will do that right in the query. How many records are in pca? __________


4. Your goal is to find distances between wastewater discharge locations and lakes. Obviously you'd rather swim
and fish in lakes that are far from discharges! The final queries w
ill be pretty long and complicated, so let's
begin with smaller ones to get us there. Find the distance from a user
-
defined point to each lake in Ingham
County:

SELECT lake_name, ST_Distance(the_geom, 'POINT(534256 420432)')

FROM lakes WHERE county='Ingham
';


Hmm, it won't calculate distance because the reference system for the point is not defined. So, set the SRID for
the point, which can be done right in the query:

SELECT lake_name, ST_Distance(the_geom, ST_SetSRID('POINT(534256 420432)', 3078))

FROM lak
es WHERE county='Ingham';


What if the point is in a different coordinate system? Suppose it's a lon
-
lat point off a GPS? Here's a lat
-
lon
coordinate in East Lansing:
-
84.481, 42.735. In that case, you need to convert it in the query using
ST_Transform. As
sume the GPS point is in WGS84 (reasonable!); then its SRID is 4326. Try this:

SELECT lake_name, ST_Distance(the_geom,

ST_Transform(ST_SetSRID('POINT(
-
84.481 42.735)', 4326), 3078)) FROM lakes WHERE county='Ingham';


Finally, let's order these lakes to
find the closest to our location:

SELECT lake_name, ST_Distance(the_geom,

ST_Transform(ST_SetSRID('POINT(
-
84.481 42.735)', 4326), 3078)) FROM lakes WHERE county='Ingham'

ORDER BY ST_Distance(the_geom, ST_Transform(ST_SetSRID('POINT(
-
84.481 42.735)', 4326),

3078)) ASC;


How far is Lake Lansing, the biggest lake in the county, from this location? ________

Page |
5




5. Now let's get the wastewater data involved. Find the 5 closest wastewater permitted facilities to Lake
Lansing. We'll start by looking at Ingham County
lakes and identifying exactly what it is called:

SELECT * FROM lakes WHERE county = 'Ingham';


Looks like 'Lansing, Lake' is what we want:

SELECT * FROM lakes WHERE lake_name='Lansing, Lake' AND county='Ingham';


Ok, now for the first shot at the two
-
table

query. Note that we rename the lakes table to make it shorter to
type, and that we transform the geometry of the discharge table so that it matches the lakes reference system.
We also use LIMIT to return not the entire list of distances, but just the clos
est 5:


SELECT lk.lake_name AS lake, pca.name AS company, pca.sic_text AS type, ST_Distance(lk.the_geom,
ST_Transform(pca.the_geom, 3078))

FROM lakes AS lk, pca

WHERE lk.lake_name = 'Lansing, Lake'

ORDER BY ST_Distance(lk.the_geom,ST_Transform(pca.the_geom
, 3078)) ASC

LIMIT 5;


6. If your distance numbers look ridiculous (in the millions of meters, say) then your projection definition for
Michigan GeoRef has an error in it. This is not your fault, but you can fix it by updating your geometry table (the
bug
is that there is no '+no_uoff' parameter; you can see it has been added in the command below

(be sure to
have a space before the ‘+’ signs)
:

UPDATE spatial_ref_sys SET proj4text='+proj=omerc

+lat_0=45.30916666666666 +lonc=
-
86 +alpha=337.25556
+k=0.9996 +x
_0=2546731.496 +y_0=
-
4354009.816 +ellps=GRS80 +datum=NAD83 +units=m +no_defs +no_uoff'
WHERE srid=3078;


After running that line, run the distance query again. The coordinates should be transformed appropriately and
you should get the correct distances.


7. One last query. Let's

get all discharges within 2,000
meters of any lake in Ingham county:

SELECT lk.lake_name AS lake, pca.name AS company, pca.sic_text AS type, ST_Distance(lk.the_geom,
ST_Transform(pca.the_geom, 3078))

FROM lakes AS lk, pca

WHERE ST_
Distance(lk.the_geom, ST_Transform(pca.the_geom, 3078)) < 2000

AND lk.county = 'Ingham'

ORDER BY ST_Distance(lk.the_geom, ST_Transform(pca.the_geom, 3078)) ASC;


Step 8: Exporting queries as shapefiles

1. Suppose you'd like to save the last lake query as a

shapefile and pull it into a GIS. This is not hard.
Add the

lk.the_geom AS the_geom to the returned fields above, and then make a new table:

CREATE TABLE near_lakes AS [YOUR BIG QUERY HERE]


2. Then quit postgresql, and at the command prompt type:

pgsql2s
hp
-
f nrlakes
-
P [your password] [your db name] near_lakes


This will make a set of files comprising a shapefile named nrlakes.shp, .shx, .... View it in QGIS.


Step 9: Connecting directly to PostGIS

1. In Quantum GIS there is an option to open a PostGIS database connection. This will let you bypass the step of
Page |
6



generating a shapefile. In QGIS,
go to the


Layer
” menu and select “
Add PostGIS Layer
…”
. Hit

New

. For Name,
type 'My PostGIS DB'. Leave Host
blank, and type your database name (your user ID). Finally, enter your
user
name and password. Hit

Test Connect


to see if it works. If so, hit OK to go back to the main screen,
and
then

hit

Connect

. Your spatial tables should appear in the window. Click

on lakes and hit Add. It should pop up
in the viewer.


2. You can also enter a query here to get a subset. For example, go back to the
“Add PostGIS Layer…”, connect to
your PostGIS DB
, select the “lakes” table
, and hit the

Build query


button.
If there i
s any text in the “SQL where
clause” box, clear it. Then use the buttons above to build the query or simply
type
:


county


= '
Allegan
'. Hit
Test to make sure you get some rows, then hit OK.

Click “Add” to put the selected features as a new layer on
your m
ap.


Report

Write this up as a web page; post it to your 425 web site

and email your TA with the URL.
Include both the SQL
code and the answers to the following questions:

1. Drop the waypoints table. Indicate the command to do so.

2. What is the
projection/coordinate system of the Michigan lakes data?

3. What five counties have the fewest lakes in Michigan?

4. How many
“Silver Lake”s

are there in Michigan?

5. How many counties have at least 100 lakes in them?

6. What county has the most wastewater

discharge permits? How many permits does it have?

7. What is the name of the northernmost lake in the dataset? (Hint: you'll need a subquery, or you can use
LIMIT)

8. How many Oakland County lakes are within 1500 meters of a wastewater discharge permit?

9. Which five Michigan discharge permits are the farthest away from Burt Lake (Burt Lake is one of the biggest
inland lakes)?

10. How far is it from Sundog Lake to Wethea Lake?


Helpful PostGIS scripts

shp2pgsql converts a shapefile to an SQL

script

pgsql2shp converts a spatial SQL table to shapefile


Some helpful SQL commands:

LIKE

(<expression>)


expression occurs in something

ILIKE
(<expression>)


like LIKE, case
-
insensitive

ST_SRID
(the_geom)


return the
reference system ID

ST_X
(the_geom)


return x
-
coordinate

ST_Y
(the_geom)


return y
-
coordinate

ST_asText
(the_geom)


return text values, not binary

ST_Distance

(the_geom, the_geom) return distance between the two geometries

ST_Transfo
rm
(the_geom, SRID)

transform a geometry to the specified reference system

LIMIT

<number>


limit the returned records to <number>

COUNT
(<field>)



count the number of items in the field