Data Integration: Using TAPIR as an asynchronous caching protocol

scacchicgardenΛογισμικό & κατασκευή λογ/κού

13 Δεκ 2013 (πριν από 3 χρόνια και 7 μήνες)

117 εμφανίσεις

Data Integration:

Using TAPIR as an asynchronous
caching protocol

Aaron Steele

asteele@berkeley.edu

University of California at Berkeley

Museum of Vertebrate Zoology

Network

Application

Network

Application

Network

Application

Network

Application

Cache

Network

Application

Cache

Network

Application

Cache

Network

Application

Cache

Network

“Nanos Gigantium Humeris

Insidentes.”

-

Issac Newton

How Can Google Help?


Google Base


Google Subscribed Links

Google Base


Submit record metadata: form, bulk, API


Google creates your data index


Query data using Google Base protocol


Search results link back to your data


Track usage statistics


Change or delete metadata


No storage or transmission limits


Check the TOS for details

BioCase & TAPIR Adapters

to Google Base

Adapter

Application

Google Base

(cache)


Google Subscribed Links


“Add‏custom‏search‏results‏to‏Google”


You define query, result format, result link


Dynamic! Supply XML, TSV or RSS feeds


Include images or gadgets (maps, etc)



Users subscribe to your links

A Word about Citations


A link is essentially a citation


Search results from Google Base and
Subscribed Links return pointers (links) to
your data, not the actual data

Network

Application

Network

Application

Cache

TAPIR

Protocol

SQL

Data Harvesting

Software


Java
1.5
, Eclipse, dom
4
j, Hibernate, MySQL


XML configuration


Resource access points and a global set of
filtered concepts to cache


HigherGeography = Madagascar


Class = Aves OR Class = Reptilia


CoordinateUncertaintyInMeters != null


Harvest via TAPIR inventory requests (KVP)



Paged inventories were handled with an
Inventory class that implemented the
Iterator interface



Network

Application

DwC

Cache

TAPIR

Protocol

SQL

Update

Feeds

Data Synchronization

Implementation


Network records added, removed, changed


Cache must reflect these changes


PHP 5, SQLite application


Register resources


Generates Atom & RSS GUID update feeds


Compares successive copies of GUID
-
DLM
inventories:


if new GUID detected, record INSERT


if DLM changed, record UPDATE


if old GUID missing, record DELETE

Network

Application

DwC

Cache

TAPIR

Protocol

SQL

Update

Feeds

HerpNET Proof of Concept


Class = Reptilia OR Class = Amphibia


CoordinateUncertaintyInMeters != null


20/80 providers accessible via TAPIR


200k cached georeferenced records


AmphibiaWeb synonmy lookup on scientific
name using synonmy server, each synonmy
name looked up in cache for coordinates,
then results mapped using BerkeleyMapper


Query times reduced to 5ms from 15s

Future Work


ReBioMa Project


Funded by MacArthur


Dynamic SDM for Madagascar


MaxEnt using cached records from TAPIR
providers georeferenced by BioGeomancer


New models fitted and projected when
cache updates