ox was able to use
udless satellite imagery for the whole world for
less than the
f a PC
At MapBox, we routinely use cloud computing for some of our biggest
jobs, such as rendering a full tile set of the world. We host hundreds
of millions of maps per month on our infrastructure, which is fast and
resilient to massive spikes – and it’s all built on Amazon’s AWS cloud
platform. The availability of minutely metered server infrastructure,
together with smart management tools, is exactly what allows a
comparatively small team like ours to take on big clients.
Our recent launch of a new cloudless MapBox Satellite imagery
layer is a great example of how we leverage cloud computing to
process vast amounts of imagery at negligible cost.
lue sky thinking
To create a cloudless picture of the world, many shots of the same
area have to be analysed and the best shots selected then stitched
together to create a seamless image of the world. We decided to
take this approach one step further. With massive and affordable
cloud computing power at our fingertips, our satellite team decided
to tap into the full wealth of two years of MODIS data to pick the
best available imagery for each pixel, instead of using a relatively
To create a beautiful, cloudless satellite map of the world in zoom
levels 0 to 8, we processed an average of between 700 and 1,400
pixels for any given pixel, all together representing just under a
terabyte of imagery.
Remember your first computer? It was probably more expensive
than what we spent on processing this imagery, thanks to the cloud.
ownloading the world from
All in all, we downloaded over 400,000 images from two different NASA
endpoints: GIBS and LANCE-MODIS. For the first step, we spun up 40
AWS EC2 micro-instances and queued them up using AWS SQS to
download data from NASA’s server and push it into AWS S3. This gave
us a data store that we could later quickly access for image processing.
We co-ordinated closely with NASA’s team and made sure our
requests were paced so that we would not overly tax their servers
and risk getting our IPs banned. In this way, we downloaded over
seven million MODIS tiles from the GIBS WMTS alone, at an average
rate of 30MB/s over 24 hours. Over the course of one weekend, we
downloaded close to the 400,000 source images we needed.
e-clouding in the cloud
We took advantage of MODIS rapid response subsets, which LANCE
distributes as 10º tiles – all in all, about 600 tiles cover the entire world.
For each tile location, we had between 700 and 1,400 source images
measuring 4096x4096px. Using a single, beefy 4XL EC2 instance, we
processed five tiles at a time, iterating over each pixel in each image at
each tile location to determine the clearest pixel out of the 700 to 1,400
The result was a set of 90 images for each tile, the first one
containing all the best pixels, the second containing the second-best
pixels and so on, with pixel quality decreasing with each subsequent
image. In a final step, we then merged the top 30 images together to
smooth out artefacts and stored the result back into S3.
June 2013 |
TileMill in the cloud
Our fi nal product was to be a map tile set of the entire world in zoom
levels 0-8, a total of 87,381 tiles where each one is 256x256 pixels. The
tile set was going to serve as the low-zoom level portion of the new
To that end, we used the best image for each tile to create a
single virtual raster image covering the world. We then cut the virtual
raster into large 8192x8192px tiles, which we geo-referenced and re-
projected into web mercator. These metatiles serve as the direct source
for rendering our tile set. We handled this step again with a single 4XL
EC2 instance where we had downloaded all relevant images from S3 to
To render the actual tile set, we used MapBox’s open source map
design studio TileMill. In September, we released TileMill version 0.10.0,
which redefi ned the creative possibilities for web cartography by
supporting compositing layers and features, achieving Photoshop-like
clipping, masking, blurring, or highlighting. The general design for
this support comes from the SVG compositing specifi cation that all
modern web browsers are working toward. We’ve continued our focus
on raster data handling in subsequent TileMill releases, implementing
increasingly advanced analytical and raster styling features.
TileMill is traditionally a desktop application, but thanks to its
client-server architecture with a browser-based UI, it is easy to set up
on a remote server and then control it remotely using a browser. We
ran TileMill on the same 4XL EC2 instance and used the large metatiles
as the source to generate a fairly compact tile set of the world in zoom
levels 0-8 that came in at under 1GB.
The larger cloud
MapBox is powered by open source, largely by Node.js, Backbone.js,
Puppet and Jekyll, and most of our stack is deployed on
Amazon’s high-performance cloud infrastructure. MapBox makes
use of cloud services to scale quickly with demand and avoid
centralisation. At any time, we’re running a cluster of EC2 instances
as our primary application servers. An elastic load balancer divides
traffi c between the cluster of running servers and routes around
ones that become unresponsive.
MapBox serves maps from 30 globally distributed edge servers.
With an edge server close by, MapBox maps are fast no matter where
you are. Our geo-redundant infrastructure spans the entire globe for
speed and reliability - even if there’s a massive power outage or natural
disaster, we will always have a fallback.
WITH MASSIVE AND AFFORDABLE CLOUD
COMPUTING POWER AT OUR FINGERTIPS, OUR
SATELLITE TEAM DECIDED TO TAP INTO THE FULL
WEALTH OF TWO YEARS OF MODIS DATA TO PICK
THE BEST AVAILABLE IMAGERY FOR EACH PIXEL
Alex Barth is an open data expert at Mapbox ( www.mapbox.com )
The world on any given day is surprisingly cloudy
Don Juan Pond, with mountains from the Asgard Range to the left, in Antarctica
TileMill in the Cloud – Australia, Indonesia, and Papua New Guinea
Amazon AWS: Amazon Web Services, Amazon’s cloud computing
platform, which off ers both storage and processing capabilities.
AWS EC2: AWS Elastic Compute Cloud – Amazon’s cloud-based
pay-as-you-go processing capacity. 4XL is a ‘high-memory,
quadruple extra large’ instance designed for high throughput
AWS S3: AWS Simple Storage Service – Amazon’s cloud-based pay-
as-you-go storage capacity.
AWS SQS: AWS Simple Queue Service – provides a hosted queue to
automate workfl ows in AWS.
GIBS: Global imagery browse services. A set of standard services
designed to deliver global, full-resolution satellite imagery in
a highly responsive manner. Its goal is to enable interactive
exploration of NASA’s Earth imagery for a broad range of users.
LANCE-MODIS: NASA service that provides certain MODIS data in
MODIS: Moderate Resolution Imaging Spectroradiometer, a key
instrument on the Terra (EOS AM) and Aqua (EOS PM) satellites.
Terra’s orbit around the Earth is timed so that it passes from north to
south across the equator in the morning, while Aqua passes south
to north over the equator in the afternoon.
WMTS: The Open Source Geospatial Foundation’s Web Map Tile
Service, a specifi cation for storing and retrieving cartographic data.
Zoom levels 0-8: 0 corresponds to a 360º view of the whole world at
a scale of 1:500,000,000, with each pixel representing 156km;
8 corresponds to a 1.4º view at a scale of 1:2,000,000, each pixel