Dracones: Web-Based Mapping and Spatial Analysis for ... - PGCon

handslustyInternet και Εφαρμογές Web

14 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

72 εμφανίσεις

?

Dracones: Web
-
Based Mapping and
Spatial Analysis for Public Health
Surveillance


Christian Jauvin

David Buckeridge


McGill University

?

Summary



Dracones:


Built with MapServer/PostGIS


We'll be covering:


Public Health context


Software architecture


Some specific problems

?

Public Health
-

Two Perspectives



Case management


Individual cases of notifiable diseases


Relationship networks


Population surveillance


Larger risk patterns

?

Case Management


Questions/problems:


Is a case due to recent transmission?


If so, does the case share any feature with
other, recent cases?


Ways it's being done:


Investigations/interviews


Meeting with other investigators


?

Population Surveillance


Questions/problems:


Are more cases happening than expected?


Does an excess suggest ongoing
transmission in a specific region?


Way it's being done:


Semi
-
automated routine temporal and space
-
time statistical analysis (SaTScan)



?

Montreal DSP


D
épartement de santé publique de Montréal
(Public Health Agency)


Need: incorporate spatial data + analysis
capabilities within workflow


One reason: research shows that spatial
information helps


Answer: Dracones project


Funded in part by GeoConnections


Led by David Buckeridge, MD, PhD


15 month contract

?

Case Management at the DSP


Current Situation


Information on paper
entered into system (Oracle
DB + Forms)


System contains sensitive
data (names, addresses)


Limited tools for analyzing
case data


Project Goal


Capture spatial data


Visualize and analyze
spatial distribution of cases

?

Population Surveillance at the DSP


Current Situation


Routine temporal and
space
-
time statistical
analysis


Capacity to visualize
time
-
series but not
maps


Project Goal


Add mapping capacity


Extend range of
analytic methods

?

Why Location Matters
-

Case
Management



If you are studying a case of a certain
disease that was just declared



It is harder to picture the situation by
looking at something as this..

?

Why Location Matters
-

Case
Management

?

Why Location Matters
-

Case
Management



Than by looking at this..

?

Why Location Matters
-

Case
Management

?

Why Location Matters
-

Population
Surveillance



If you are studying the spatial distribution
of a set of disease clusters



This would seem more difficult..

?

Why Location Matters
-

Population
Surveillance

?

Why Location Matters
-

Population
Surveillance



Than this..

?

Why Location Matters
-

Population
Surveillance

?

Development Process


Management Team


Led by public health MD with informatics
training


Members from each area of DSP involved


User Involvement


Users on management team


Input throughout requirements, design,
development

?

Software Required
and Our Choices

Software Type Required

Our Choice

~GIS

MapServer

General + Spatial DB

PostgreSQL + PostGIS

Cartography
-
enabled client

HTML/Javascript

Analytical / statistical tools

SaTScan, R, Python

?

Web Architecture Benefits



Usually lighter/simpler technologies


Cross
-
platform


Ease of deployment and integration


Builds on existing set of conventions and
behaviours

?

System Architecture






Oracle DB

Oracle Forms

Current Case Management System

Web client







Bridge

{

Python

R

SaTScan




{

Apache + PHP

MapServer + MapScript

PostgreSQL/PostGIS DB

Dracones

?

Client Side
-

UI


UI is 100% Javascript (ExtJS library)


Future project: extract the map
-
manipulation parts:


Tile
-
based panning


Zooming


Layer activation


And releasing them under an OS license

?

Client Side
-

Functions



From the results of a query performed in
the Oracle client, launch the application to
visualize the results


Inspect those results by varying certain
parameters


Launch external analysis tools


?

Server Side
-

MapServer



MapServer: OS tool that add geospatial
content to web applications


Can be used as a CGI


Interface with many programming
languages


Works very closely with PostGIS

?

Server Side
-

MapServer


MapServer with Apache 2.2, using PHP5


Linux and Windows


Since it's stateless, each interaction:


Build a map object from a base mapfile


Modify the map object (according to client
parameters)


Return rendered map as a file to the client
(that will display it)

?

MapServer
-

Layers



A map object is made of layers


A layer can be loaded from a shapefile
(ESRI open format), that specifies its
geometry


Or it can be loaded directly from a
PostGIS table

?

PostGIS



PostGIS: spatial extension for
PostgreSQL


Adds geometry types (points, lines,
polygons, etc)


Spatial functions and operators (distance,
convex hull, intersection, etc)


Spatial indexes

?

PostGIS


Queries that mix spatial and non
-
spatial
aspects of the data


If you have a case table:

case_id

condition

region_id

1

TB

10

2

Gastro

20

?

PostGIS


And a region table:

region_id

name

geom

10

Centre
-
Sud

POLYGON(…)

20

Hochelaga

POLYGON(…)

?

PostGIS


You can then build a query like this:




SELECT * FROM case, region



WHERE case.condition = 'TB'



AND case.region_id = region.id



AND within(region.geom,





GeomFromText('POLYGON(…)')

?

PostGIS



A MS layer can be built simply by adding
a connection attribute, pointing to the PG
table (two lines really!)



Shapefile and table sources can be mixed

?

Analysis Tools
-

SaTScan



Requirement: interfacing with analysis
tools


SaTScan: detection of space
-
time clusters


Scan for areas where the probability of
being a case is significantly higher than
being a non
-
case

?

Analysis Tools



Since it's a command
-
line tool without an
open API, we use Python to run it, parse
the results and plot them using MapServer



We do the same for some external R
routines

?

System Data Sources



Health data


Reportable disease database


Ancillary data on contacts


Geographical data


Street networks and postal code file


Health regions, census, postal boundaries

?

Using Address Data from a Public
Health Database



Problem: addresses are stored as
character fields:




No validation at the entry point


Data quality is compromised



Address:

1500
-
a Sherbroooke St. Ouest

?

Two Problems with Address
Processing



The addresses need to be parsed, and
possible (and numerous) transcript errors
and ambiguities must be solved



The ones which refer to a same place
must be identified and treated as a unique
object

?

Possible Solutions



These could be solved in a more SQL
-
integrated manner: edit distance module
for PG (?)



We decided however to go the procedural
way (using Python)

?

Address Validation Algorithm
-

Requirements



A database with (1) the street network
geometry


(2) the street segment address ranges


And (3) the postal code geometry and
street range association

?

Address Validation Algorithm

So you will know for instance that:

1001

2001

3001

998

1998

2998

H2X2T1

H2X2T2

?

Address Validation Algorithm
-

Steps


Parse the text addresses in 3 tokens:


{S#, SN, PC}


For each triplet:


Try to find an exact match, by being tolerant
on SN (maximum coverage, edit distance..)


By being tolerant on SN, try to vary PC


Idem with SN, fix PC and vary S#

?

Address Validation Algorithm
-

Batch Results


By doing a batch analysis of the DSP data
(105K records), we found that:


84% of the address records were "exact"


14.5% were recoverable errors


1.5% were non
-
recoverable errors

?

Last Address Processing Step:
Geocoding

Geocoding by interpolation:

1001

2001

3001

998

1998

2998

H2X2T1

H2X2T2

1500 Sherbrooke

?

A Last Problem



DSP management system is read
-
only
(for us)


Not spatially enabled


Must not affect performance

?

And its Solution


Create a mirror of the DSP data model,
using PG


Augmented with spatial aspects (and
more adapted address handling)


Refreshed periodically


Reprocessing of the content that has
changed


Extraction of the new one


?

A Challenge



Interface and extend existing:


System


Environment (including an important
community of users and developers)


?

Lessons Learned


Very strong interest in using spatial information at the
DSP but infrastructure, skills and data quality are limiting


Large effort to validate and correct all addresses


The science of spatial analysis in public health often
lags the technology


How to analyze multiple locations for each individual?


How important is spatial location in an urban area?


Open
-
source, web
-
based mapping software and spatial
databases (MapServer, PostGIS) are robust and easy to
work with for skilled developers


?

Acknowledgements


GeoConnections, CIHR


McGill University


Aman Verma, Sherry Olsen, Andrew Carter


Montreal DSP


Louise Marcotte


Robert Allard, Lucie Bedard, André Bilodeau


Montreal Chest Institute


Kevin Schwartzman, Jonathan Richard


Alice Zwerling, Marie
-
Josee Dion