Geospatial Methods in Health Geographical Research

surfscreechingSoftware and s/w Development

Dec 11, 2013 (3 years and 7 months ago)


Thank you
to Anders

for inviting me to speak with you as
of your class on Medical Geography,
and thank you for attending.


we will consider three main topics.

First, we will consider key concepts in geospatial analysis and the importance of this

Second, we will explore a range of geospatial methods by showing how they have
been used in health research.

Finally, we will think about some opportunities for advancing the use of geospatial
in medical geography.

At the end, I will also share with you a bit of information about what I will be doing
as a Visiting Professor at Lund University and sharing information about how you
can download a copy of my presentation today in the form of notes pages showing
each slide over the text of my remarks, including citations and additional details.


Medical geography is the application of geographic concepts and methods to the
study of health. This is John Hunter’s definition from the 1970s.

More recently, geographers have broadened the field by referring to “geographies of
health,” as described in the book Geographies of Health, Second Edition, by Anthony

and Susan Elliott. This enlarges the field to consider more than health
outcomes and delivery of allopathic services in medical care settings to include a
range of health issues and approaches society uses to address health concerns.

Health geographers adopt many scholarly approaches, as listed here.

The positive approach looks for spatial patterns in empirical data. Social

approaches focus on the meaning of illness and health to the individual.

view emphasizes historical political and economic processes
underlying health and access to health care.

approaches consider
human agency and constraints people face in seeking health resources and avoiding
health risks. The post

approach focuses on the “otherness” or labeling
and exclusion of the ill and discounting of lay beliefs about health, illness, and

My talk today is situated in the positivist approach because it focuses geospatial
methods in health geographic research.


What are geospatial methods?

These methods take into account location, specifically location on the earth.
Locations are explicit in geospatial methods.

These methods provide the key to understanding how results change when the
locations of the phenomena being analyzed change.

A broad range of numerical methods fall under the general heading of geospatial


It is important to distinguish the terms
, geographic data, and GIS.


is the research field seeking answers to fundamental questions about how
we represent and analyze geographic data [in Europe and in Canada, this is
sometimes referred to as “

Geographic data are data resulting from observation and measurement of earth
phenomena, referenced to their locations on the earth.

GIS is a digital system for integrating and analyzing geographic data.


There is no doubt that GIS have made important contributions to health research and public health
practice over the last 30 years. It is worth considering for a moment some of the key milestones in the
development of geospatial and related technologies and some of the key public health issues that
emerged at around the same time

The first Landsat satellite for imagery was launched in 1972. For the 1980s Census in the U.S., the
innovative GBF/Dime files were developed to help conduct the census in urban areas. In the 1980s,
HIV/AIDS appeared, becoming a major global health problem

The 1990s were a decade of enormous change in geospatial technology. The TIGER/Line files
developed by the U.S. Census were one of the first digital spatial databases in the U.S. with national
coverage and a major resource for vector
based GIS analysis. This decade also witnessed the launch of
the World Wide Web for Internet browsing which led to the development of search engines such as
Google. During this time, we had the first desktop GIS software, the development of open spatial data
formats like
, and the creation of metadata standards for digital geospatial data. Also, the
completion of the Global Positioning System revolutionized our ability to capture data on location.
During this decade, the public health community focused attention on emerging infectious diseases
such as Lyme disease, and re
emerging diseases, many of which were believed to be under control in
developed countries. This period also witnessed the first GIS and health conferences in the U.S. with
federal agencies such as ATSDR (Agency for Toxics Substances and Disease Registry) and NCHS
(National Center for Health Statistics) taking the lead

The 2000s brought the first GIS and health textbooks and journals such as the
International Journal of
, an open access journal. GIS software companies began to consider the health
community as a major market for their products. Spatial statistical software like
, and
GWR were also developed, some fully and some more loosely coupled with GIS software packages,
helping us to explore health disparities and neighborhood contextual factors affecting health. We also
saw a diffusion of GIS in health to every region of the world. Google Earth and Google Maps, health

such as
, and smart phones along with a range of other geo
enabled mobile
devices are changing how we access and view digital spatial data in the early part of this decade. The
story of health insurance reform in the U.S. is now unfolding.


What is the value of geospatial methods in health research?

These methods are important because these provide essential support for health
research that cannot be obtained by other means.


The science of health research relies on data and there are different ways of viewing

Here, we have two tables of data on students in classrooms. These tables are
identical, that is, we have the same number of students with the same names and each
student with the same name has the same grade in the two classes.


The basic

statistical views of the data would therefore also be the same for
both classes (same N, same frequency distribution, same median grade, same range,
same relationship between grades and gender, and so on).


However, when we look at maps of the classrooms, we see that the classes are not the
same. In one, the grades are apparently randomly distributed. In the second, there is a
clear pattern of educational outcomes. These are only two of the many maps we
could create from the same two source databases.

GIS and geospatial technologies are the means for creating this spatial view. This is
an important view because it brings us closer to what exists in the real world. The
processes and mechanisms affecting our health occur in space and time.

Geospatial technologies enable both this view of data and the spatial analytic tools
that help us to understand whether or not these patterns could have occurred by
chance, to account for spatial dependencies in the phenomena of interest, and to
investigate how processes, mechanisms, and health outcomes differ from place to
place. Spatial analytic models, as we will see, incorporate distance as a variable.


Now that we have reviewed some of the key concepts, we can explore how geospatial
methods are used in health research by walking through the steps we would take to
understand a health problem. At each step, we will be looking at examples drawn
from the literature. These examples will not all be drawn from a single study, but they
should provide you with a sense of how geospatial analysis of a health question
would proceed.


The first step in any geospatial analysis of health is measuring location, that is
putting the features of interest onto the earth’s surface

Today, this is accomplished by using the Global Positioning System, through address
match geocoding, or by acquiring digital geospatial data from secondary public and

sources, often by downloading data from a web site

These data can be added to (or in the case of address
match geocoding, created using
the software functions of) a GIS application.


This map shows us part of the rich history of geospatial methods implemented before
the advent of the new technologies. The map, from 1819, plots yellow fever cases
near Old Slip in Lower Manhattan. A version of this map, which is in the public
domain, is in the
U.S. National
Library of Medicine along with earlier yellow fever
maps for cities dating to 1796. These maps show highly disaggregate, individual
level data mapped by residence. They were made to investigate the cause of yellow
fever at a time when its etiology was unknown.

The cases on this map are also labeled to show the temporal order of “sickening” so
we know that time
space patterns were of interest in 1819. Unfortunately, we will not
be able discuss time and time
space methods today. I hope you will be able to
consider these in another

There are strong connections between medical mapping and the development of
thematic cartography. Thematic maps are maps that show a spatial pattern. They are a
form of statistical graphic and they are not like topographic maps which are used for
. A map like this makes the data spatially extensive; we can observe the
phenomenon in many places

[E. W. Gilbert (1958). Pioneer maps of health and disease in England.

124 Part 2:172
183; Gary W. Shannon (1981). Disease mapping and early
theories of yellow fever.
The Professional Geographer

227; Howard Brody,
Michael Russell Rip, Peter
Johansen, Nigel
, and Stephen

(2000). Map
making and myth
making in Broad Street.
The Lancet



Although the spatial view of data is very powerful, it is also limited and potentially
misleading. This is a map of motor vehicle collisions occurring on federal and state
roads in Connecticut in 1995 and 1996 [
, EK, (2007), Risk Factors
Contributing to Motor Vehicle Collisions in an Environment of Uncertainty.
Stochastic Environmental Research and Risk Assessment (SERRA)
, 21;5:473

Motor vehicle collisions are an important cause of traumatic injury in the U.S. and
many other countries around the world.
Motor vehicle collisions are a leading cause
of death in the United States, especially among people under 35 years of age, and
rank third overall behind cancer and diseases of the heart in terms of years of life lost
Collision databases are interesting because they are one of the few sources of
surveillance data reporting individual, environmental, and behavioral elements of the
health event.

The data for the study were drawn from the Connecticut CODES Project. CODES
stands for Crash Outcome Data Evaluation System. These projects were funded by
the National Highway Traffic Safety Administration. The Connecticut Department of
Transportation geocoded the collision locations by longitude/latitude. Here, we see a
map showing all of the
collisions on federal and state roads in Connecticut in

This map shows the limitations of mapping and visualization. Simply geocoding and
mapping collisions yields very little insight at the state level or for particular
segments of the highway system.


Once we have located phenomena of interest, the next step in an analysis often
involves calculating distance

Distance, a measure of separation in space, is a key geographic concept and a
required measure in many geospatial methods.

There are various ways we can measure distance. If we are using locations on the
geographic grid, we compute spherical distance measuring the arc of the circle
connecting two points on the globe. If we are dealing with projected data, we can
calculate Euclidean or straight
line distance, taxicab distance, or distance along a
street network

GIS software functions enable us to calculate distances and spatial analytic functions
often allow us to specify how distances will be calculated.


Different methods for calculating distance will yield different results

Here we apply three different methods for calculating the distance between a person’s
home and a food store. Distance measures of this type are widely used in studies of
the food environment and obesity

The Euclidean distance is the shortest, by definition. The circle shows a quarter
buffer around the home location as the crow flies. Note that Store 2 is inside the
buffer and Store 1 is outside of it

When we measure distance the way a person might actually walk it, along the street
network, we see that Store 1 which is outside the buffer is actually closer than Store


Note that there are different ways we can model the effect of distance.

These distance decay curves illustrate three commonly used


One way we can try to make sense of the mass of data we have before us is to
aggregate the data

Geospatial methods help us to group observations based on spatial relationships.

The point
polygon method of aggregation has been one of the most widely used
and usually the polygons represent political/administrative units. Here we take points
like cases of a health outcome and we use geospatial methods to determine which
polygon or area (county, census tract, and so on) the point lies within. We add up the
number of points within each area and we make a map based on areas.

It is also possible to perform point
polygon analysis based on areas whose
boundaries are determined in other ways. From environmental sampling, we might be
able to identify regions where contamination is present or absent and we could group
individuals based on whether their residences were in a contamination area or not.
We could also define regions based on flows in space or distance from an origin. In
this way, we could find points within 30 miles of a health services facility, as the
crow flies or based on street network distance

We can also group polygons. For example, we might group census tracts into
counties or into health services regions.


The approach taken in the collision study was to find all collisions within a specified
distance from each collision site on the same road where the collision occurred and to
tie the kernel distance to some aspect of the health event process. This approach was
possible because the collision locations were available as points.

collisions within a specified distance from a collision of interest is
similar to creating a “box
shaped” kernel [
A. Stewart
, Chris
and Martin Charlton (2002).
Geographically Weighted Regression: The Analysis of
Spatially Varying Relationships
, England: John Wiley & Sons Ltd.]
. For
fixed object collisions involving a car striking a fixed object like a telephone pole, a
distance of one quarter mile from the collision site was used. Collisions in a group
would therefore be at most one half mile from each other. At a travel speed of 65
miles per hour, reaction and braking deceleration time would result in a total stopping
distance of 345 feet if the road surface is dry. A collision place or kernel along a
highway would have to be at least this length to capture the environment of the
collision event. The quarter mile distance of 1,320 feet was selected as a reasonable
distance for modeling collision places on state and federal highways

Kernels were generated around the site of every fixed
object collision. Because some
collisions occurred close to each other, there was collinearity in the membership of
collisions in the kernels. To address this problem, the collision kernel that had the
highest frequency of fixed
object collisions was identified and all of the collisions
within that kernel were identified and set aside. The collision place with the next
highest frequency was then selected. This process was repeated.


This map shows the 10 kernel areas with the highest number of fixed object
collisions with the symbol representing the total number of collisions of all types
occurring at those places.

Due to the scale of the map, some collision places appear to be overlapping but they
are in fact geographically distinct. These 10 places accounted for almost 3 percent of
all fixed object crashes occurring on state and federal roads. At every one of these
places, fixed object crashes accounted for a higher proportion of all collisions
occurring in the place than the proportion for the state as a


The approach to aggregating the collision data spatially was designed to avoid problems with
using political/administrative units like counties or census tracts. Those units have boundaries
that are arbitrarily determined without regard to the underlying distribution of the phenomena,
masking spatial patterns. Going back to the example of the classroom, if we decided to group
students as on the left without looking at the pattern of grades first, we would find no
differences across the three zones. If we decided to group students as on the right, we would
see a pattern.

How data are aggregated geographically can also affect the results of statistical analyses. This
is the modifiable area unit problem. This problem has two components: a scale problem and a
zonation problem. The scale problem arises when data are progressively aggregated into
fewer, larger zones. The zonation problem arises when a study region can be subdivided in
different ways at the same scale. The figure here shows a zonation problem. There is a
substantial literature on MAUP. [Paul Longley and Michael Batty, Eds. (1996).

in a GIS Environment
. Cambridge, England:

International; A. Stewart
, Chris
, and Martin Charlton (2002).
Geographically Weighted Regression: The Analysis of Spatially Varying Relationships
, England: John Wiley & Sons Ltd

There are no real solutions to the MAUP. Nevertheless, we can make an effort to use data at
the most disaggregate level possible to address the scale problem and we can create regions
with intelligent boundaries

boundaries that reflect the distribution of the phenomenon of

to address the zonation problem. Understanding the nature of the research question
is also very important. GIS can be useful in exploring the spatial structure of a problem and
the computer environment makes it possible to perform sensitivity analyses, sometimes by
creating sampling distributions of


There are two types of spatial samples

A sample OF space involves sampling some locations from the set of all places.
When we select sites for environmental monitoring or we select one or more study
communities, we are taking a sample from the set of all places. Because there are an
infinite number of places, it is time consuming and expensive to look everywhere and
it is probably not necessary to do so. Nevertheless, the design of environmental
monitoring systems or the approach to study community selection raises important
sampling questions. How many sites are needed? Where should they be located?

Because it is time consuming and expensive to monitor the environment, there are
often too few monitoring stations and these are not always located in the best places
to create an accurate model of environmental conditions, or even a model that covers
the entire study region

A sample IN space involves sampling from a geographically
distributed population.
Every sample from a population is implicitly a spatial sample. A random sample of
all people will not be a random sample of all places unless people are uniformly
distributed. This is rarely the case. This means that there may be no one residing in an
area of interest or that our sample cannot capture the variation of interest.

For example, let’s say we are testing a vaccine for a health problem like Lyme
disease. The risk for Lyme disease is greater in woodland areas and lower in more
developed places. A simple random sample would likely yield a higher number of
subjects who live in more developed places where the risk for acquiring Lyme
disease is lower. What are the implications of this for how these kinds of trials are
designed? [M. Ali, M.
, M.
, and J. Clemens, “Modeling Spatial
Heterogeneity of Disease Risk and Evaluation of the Impact of Vaccination,”
Vaccine, Vol. 27, pp. 3724
3729. 2009.]


Geospatial methods can help us explore the spatial basis of evidence

Because so many health studies have been conducted in a limited number of places with a limited
number of participants, meta
analysis methods have been developed to improve statistical power in the
analysis of data. These studies generally do not consider where the studies have been conducted

Dr. Blair T. Johnson of the Department of Psychology and the Center for Health, Intervention, and
Prevention at the University of Connecticut has been working on meta
analyses of published studies of
trials in African nations. Strict inclusion criteria were applied in selecting the studies. They all
evaluated efficacy of interpersonally
delivered HIV/AIDS prevention efforts initiated from 1986 to
2008 on a behavioral outcome relative to a control group or baseline assessment. There were 93
studies identified, some of them with multiple parts. This map shows where the studies were
conducted. For some studies, location is reported by city or town, for others, by region, and for others
only the country within which the study took place was reported. Some studies involved more than one
place and each place is included in the count. The counts are mapped to the centroids of the city,
region, or country and the size of the symbol corresponds to the count

Locating the studies permits Dr. Johnson and his team to develop spatial meta
analyses to analyze
contextual factors such as support for human rights, women’s economic power, and levels of income
as factors which might affect intervention efficacy and to investigate whether or not there are spatial
patterns in intervention efficacy with more efficacious interventions being conducted in particular
areas. It also helps to identify where trials have been conducted and where they have not. For more
information, contact Dr. Blair T. Johnson at

[Johnson, B. T., Picho, K.,
Medina, T. B.,

Warren, M. R.,

& Ballester, E.
(2011, September).
Active Ingredients in

Interventions for HIV Prevention in African Nations: A Meta

Paper presented in B. T. Johnson & A. Good (Co
), “Advancing the Science of

Change,” 25

annual meeting of the European Health Psychology Society, Crete, Greece.]


Perhaps the most dominant use of geospatial methods in health to date has been
analyzing neighborhood context of health events. This involves using geospatial
methods to place an individual in an area and take into account area characteristics in
explaining the health of the individual. Hierarchical modeling is the primary method
used in much of this research. This approach has made health studies more
geographic, but not yet more spatial. Most hierarchical models do not account for
spatial affects, that is, spatial dependencies across neighborhoods [B.
, J. Merlo,
P. Chauvin, Comparison of a Spatial Approach with the Multilevel Approach for
Investigating Place Effects on Health: The Example of Healthcare

Journal of Epidemiology and Community Health
, 2005;59

health studies are limited in time (cross
sectional in nature) and in place
(focused on one locality). As a result, we sometimes see inconsistent results in studies
conducted in different localities. Does the presence of parks lead to more physical
activity or not?

In addition, we have difficulties identifying groups of places with the same
configurations of factors affecting health outcomes

Despite all of this detailed, spatially extensive data, many analysts still fall back on
aggregate or global methods of analysis which do not help us find key spatial patterns
of interest.


Geospatial methods help use to analyze spatial patterns and processes. Spatial
clustering methods help us to analyze spatial patterns in data. There are a wide range
of clustering methods. Some methods are global methods used to identify whether
there is an overall pattern of clustering in the data. Others are local methods used to
identify the number and locations of individual clusters

Spatial clustering techniques rely on spatial weights describing the proximity of
neighboring areas or health events.


Global statistics summarize data for entire regions yielding a single statistic which is

and can provide misleading interpretations of local relationships.

Local statistics, on the other hand, summarize data for individual places within entire
regions yielding multiple statistics, one for each place. These are potentially
interesting when mapped (they are spatial) and they are useful for exploratory data
analysis, confirmatory analysis, and building more accurate global models

The most recent versions of widely used GIS software packages are incorporating
more spatial statistical functions for both global and local analyses. For example, we
perform global and local clustering analyses, or perform ordinary least squares
regression and geographically weighted regression analysis in which spatial variation
in the regression parameters may be observed.


Moran’s I is a global measure of spatial autocorrelation. Positive spatial
autocorrelation means that like values are clustered together. Negative spatial
autocorrelation means that high and low values alternate in a kind of checkerboard
pattern. If a significant pattern of spatial autocorrelation is not found, the pattern of
values is random.

Moran’s I was used here to investigate whether or not there was a pattern of overall
clustering of average depressive symptom scores among community
living elderly
participants in a telephone survey conducted in New Jersey. A significant pattern of
positive spatial autocorrelation was
found. [Ellen
, Maureen Wilson
, Rachel A.
. 2012. Neighborhood characteristics and depressive
symptoms of older people: Local spatial analyses
. Social Science & Medicine


Moran’s I and other global clustering measures do not identify the number and
locations of individual clusters.


The LISA statistic gives us local indicators of spatial autocorrelation. It measures the
association between the value at a particular place and values for nearby areas.

This map shows the locations of census tracts with high average depressive symptom
scores that are surrounded by other tracts with high depressive symptom scores.
These clusters are statistically significant, meaning that it is unlikely that these
clusters occurred by chance.

This method can also identify significant clusters where tracts with low average
scores are surrounded by tracts with low average scores, or tracts with high average
scores are surrounded by tracts with low average scores, and so on.

With this understanding of the data, we can design better studies investigating
neighborhood contextual effects on depression, intervention trials, and service
delivery systems. [
, R.A., Wilson
, M., Cartwright, F.P. 2011. The
Texture of Neighborhoods and Disability Among Older Adults,
The Journals of
Gerontology, Series B: Psychological Sciences and Social Sciences



In addition to analyzing spatial patterns in data, we can use spatial statistical methods
to investigate processes. Spatial statistical methods take into account locations and
spatial relationships.

We can consider
several examples of local spatial statistics.


Returning to the study of collisions, we can look at how local statistics were used to
understand processes occurring at the various collision sites.

Local proportions and odds ratios were calculated to assess the importance of
individual, environmental, and behavioral factors associated with collisions at
different places.

Are the same factors associated with outcomes across all places?


For the geographically weighted proportions, the area defining the collision place is
examined and a weighted proportion of all collisions in this area that are fixed
collisions is calculated by

the formula shown where


is the weight assigned to

for the geographically weighted proportion at location

[Fotheringham et
al. 2002]. The term

is a binary indicator variable taking on the value of 1 if the
collision has the specified attribute (for example, collision type, weather, road
surface) or the value of 0 if it does not. For this analysis using box kernels,
equal to 1 for collisions within the kernel. Collisions outside the kernel have a weight
of 0

Note that other weighting formulas are available, most commonly, weights using the
Gaussian distance function and a bandwidth.

For the geographically weighted odds ratios, the collision type (fixed object or not) is
compared against each of the other attributes (individual characteristics such as
driver age and sex, environmental characteristics such as weather) in a 2 x 2
contingency table for collisions at each collision place and the local odds ratio is
calculated based on the formula shown.


I realize these tables make it difficult to read the individual cell entries. Focus on the pattern
of significant values highlighted in red

For all collisions in the state as a whole (the bottom row), only 24% occurred when it was
raining, sleeting, or snowing, 65% percent occurred on dry roads, and 70% occurred during
daylight conditions. This would suggest that environmental conditions are not that important
in explaining motor vehicle collision risk. With respect to age, drivers or pedestrians under 25
or older than 44 were at fault in 54% of collisions and 61% of drivers were male. These
individual age and sex characteristics are consistent with the characteristics of collisions
reported in the literature. The analysis of local proportions and odd ratios, however, reveals
different and variable patterns for high collision frequency locations in the state

At Place 2 and Place 6 (rows 2 and 6), for example, very high numbers of crashes were fixed
object collisions and very many of these occurred when road surfaces were wet when drivers
were operating vehicles too fast for conditions.

Configurations of conditions at other high frequency sites are different, however.

Place 5, the characteristics of crashes across most factors were different from
characteristics of crashes in the state as a whole and even at other high frequency fixed object
crash places

At Place 10, the characteristics were more similar to all crashes occurring across the state.

Different places with lots of the same event were different from the state as whole based on
different factors.


Local odds ratios reveal additional patterns of interest and the value of using odds
ratios in addition to local proportions.

Place 6 for example was not significant for rain and snow. Place 6 had a high
proportion of collisions occurring on days with precipitation and wet roads but the
local odds for weather and road surface were not significant. This suggests that the
collisions of other types occurring at Place 6 were also affected by weather and road
surface, not just the fixed object collisions.

At Place 1, Place 3, Place 7, and Place 8, fixed object collisions were more than 10
times more likely to be associated with driving too fast for conditions. These are
places where enforcement interventions (modifying not individual characteristics or
environmental characteristics but how people behave in the environment) might be


The use of local statistics in the collision study suggests that spatially varying
relationships may be at work

We can use geospatial methods to map data on the amount of time spent outdoors and
this might vary spatially. We can use geospatial techniques to monitor, model, and
map geographical variations in air quality

It is also possible, however, that the relationship between the amount of time spent
outdoors, air quality, and health outcomes itself varies from place to place. If this is
the case, we need to be able to find the set of places or spatial domain where
particular processes are at work.


A neglected area in research on human health outcomes is the role of health services.
So much health research is designed as if the structure and functioning of the health
care system has nothing to do with the health outcomes we observe. We need
explicitly to address how the structure of the various health care systems in the U.S.
and state and local regulations on insurance, practitioner licensing, and other
components affect the patterns of health and disease we are trying to understand.

Geospatial methods can play a role here, too, especially by using spatial interaction
models to evaluate accessibility to existing services. [E.K.

& S. L.
, 2012, GIS and Public Health, Second Edition, New York: Guilford, pp.

We can also use normative modeling techniques to evaluate optimal service delivery.


Normative models have several key elements. The objective function describes the
“best” result. It describes what we want to minimize or maximize. For example, we
might want to allocate patients from a set of demand sites to a set of clinics to
minimize total travel cost. This is an example of an allocation problem. The set of
clinics is fixed in terms of location and capacity. The population of clients and their
locations are also fixed. We want to assign or allocate patients to clinics to meet the
stated objective

In meeting this objective, we face a set of constraints. First, every patient must be
served. Second, the number of patients allocated to a clinic must not exceed the
capacity of the clinic. Finally, patient travel to a facility cannot be negative. That is,
we cannot send a negative number of patients to a clinic. This is an example of a non
negativity constraint. Real world problems generally have these constraints.

Once the problem is formulated, we can review the data requirements for solving the
problem. We need data on the location and number of patients at each demand site.
We need data on the location and capacity at each supply site. We need to know the
per patient cost of travel from each demand site to each clinic. This can be measured
in terms of distance, time, or money

Tim Brown, Sara
, and Graham Moon include a history of location
allocation planning in health geography in Chapter 28 of
A Companion to Health and
Medical Geography

(Blackwell, 2010). Shams


and David Smith provide
a review of the use of location
allocation model in health service planning in
developing nations in the
European Journal of Operational Research
, Volume 123,
Issue 3, pp. 437
452 (2000


We can translate these elements into a numerical formulation of the model.

Here, the objective function is to minimize Z, the total travel cost of the allocation,
where Z equals the number of clients from demand site

(one in the set of all demand
sites) allocated to clinic j (one in the set of all clinics) multiplied by the travel
distance or cost from demand site

to clinic j, summed over all clinics and all
demand sites.

The allocations are subject to the following constraints

For every demand site
, the number of clients assigned from a demand site

to a
clinic site j summed over all clinics has to be greater than or equal to the number of
clients r at a demand site
. This ensures all clients will be served

For every clinic j, the number of clients assigned to a clinic j summed over all
demand sites has to be less than or equal to the capacity q of the clinic j. This ensures
that the capacity of every clinic will not be exceeded

All allocations of patients from a particular demand site

to a particular clinic j have
to be greater than or equal to 0.


We can use GIS to map the locations and numbers of patients at each demand site.

can map the locations and capacities of the clinics

We can use GIS to create an origin
destination transportation cost matrix showing the
cost of travel from each demand site to each clinic

For large problems, we would have to use software such as SAS OR

analyze these data. The results, however, can be brought back into the GIS
application for display.


The optimal allocation is shown here. The 61,158 patients aggregated to the centroids
of the 21 towns where they reside are allocated to the 9 clinics with a minimum total
travel cost of 301,614.2 miles

Once we have completed our analyses of health problems, we want to share our
findings and data with others. Geospatial methods play a role in disseminating
information on health.


The Malaria Atlas Project initiated in 2006 illustrates just such a project. Although
malaria is not at present a significant public health threat in the U.S., this project
offers an interesting model of the kinds of work we could be undertaking. Mapping
malaria is challenging because transmission intensity is geographically
heterogeneous. Researchers interested in malaria realized that we did not have a good
map of malaria prevalence (reports of the proportion of a sampled population that is
confirmed positive for malaria parasites), and this was making it difficult to evaluate
the impact of different malaria control projects being implemented in different parts
of the world. They gathered prevalence reports, evaluated them based on strict
inclusion criteria for the map project, and

them. The MAP researchers
worked with a global database of almost 8,000 survey reports to model a continuous

surface on a 5 x 5 km grid based on initial work to define the global
spatial limits of malaria transmission. In assembling the prevalence reports for the
mapping project, the researchers noted an increasing tendency for national surveys to
be conducted so that they would be representative of all areas within a country, not
just areas of high prevalence, and there were many zero prevalence values recorded
in the reports analyzed. Researchers used a combination of data provided by the
source of reports, an online database of geographic names, online gazetteers, and
paper maps to create

point references for the reports. Surveys that could not be

or that could be

only to larger areas (greater than 25 km
) were excluded. The temporal structure of the data was also taken into account.
The reports covered different times throughout a period from 1985 to 2009. A report
was referenced temporally by the midpoint in decimal years between the start and
end months of the report.


methods and gridded population data, they constructed a
continuous, age
standardized, urban
corrected malaria prevalence map.


This map shows estimated levels of


within the
limits of stable transmission
. [Malaria Atlas Project, The
spatial distribution of


map in 2010 in South and Central

The mapped variable is the age


Parasite Rate (


describes the estimated proportion of the general population

at any
one time, averaged over the 12 months of 2010
. Values range from 0 to more than 7%.

Estimates are made based on data from

parasite rate surveys which feed into a

model that produces

a range of predicted

for each location (a
probability distribution). The model also uses data from environmental covariates which
help predict more accurately, especially in areas far from any actual survey data. The
environmental covariates include rainfall, temperature, land cover and urban/ rural status.

map shows the mean value for the probability distribution at each
location (approx. 5km

Details of the methods are available on the site.


For some maps, uncertainty maps are also provided. This is the uncertainty map
associated with the modeled distribution of


in the previous slide.

Areas in bright yellow are areas of higher uncertainty in the

estimates; the
dark blue areas are areas of lower uncertainty.

[Malaria Atlas Project, Uncertainty
in the estimates of P.


for South and Central Americas (2010), available]


Geospatial data on individuals does pose a risk to privacy and confidentiality of
health data. Increasingly, the widespread use of geo
enabled devices is creating
concern about unwanted surveillance. The pace of technological change and the
changing patterns of adoption of these technologies make it difficult to address the
problems. Nevertheless, there is a body of literature investigating how geospatial
methods can be used to address privacy and confidentiality concerns. Both spatial
statistical and normative modeling approaches have been investigated. In the notes
for this slide, I have included references to some of the relevant literature. Improved
informed consent procedures directly address geospatial risks and protections are
greatly needed.

Brownstein JS,


KD. No place to hide

reverse identification of
patients from published maps.

J Med


2; Curtis A, Mills J,

M. Spatial confidentiality and GIS: Re
Engineering mortality locations from
published maps about Hurricane Katrina

J Health

Armstrong M, Rushton G, Zimmerman D. Geographically masking health data to
preserve confidentiality
Stat Med

525; El
, K., Brown, A.,
, P. Evaluating Predictors of Geographic Area Population Size Cut
offs to
Manage Re
identification Risk.
J. Am. Med. Inform. Assoc.

2009; 16: 256
Wieland, S. C.,
, C. A.,
, K. D., Berger, B. Revealing the spatial
distribution of a disease while preserving privacy.
Proc. Natl. Acad. Sci. USA

105: 17608
Wilson, K., Brownstein, J. S.
Early detection of disease outbreaks
using the Internet.

180: 829


Now that we have completed our tour of ways geospatial methods can be used to
advance health research, I would like to thank FAS for funding the visiting
professorship and my colleagues Kristina

and Emilie Stroh for working to
secure the funding and for their support of these activities.


A series of lectures and seminars on spatial statistics is being offered this term. Here
is the schedule. The first lecture was held last week, but the presentation is available
along with the entire schedule and other materials at the following site.