PPT Slides

hurtpotatocreekSecurity

Nov 5, 2013 (3 years and 9 months ago)

98 views

Damia: Data Mashups for Intranet
Applications

David E. Simmen, et al

IBM Almaden Research
Center


Presented by John Nielsen

Terms and Acronyms (in order of
appearance)


Damia
: DAta Mashups for Intranet
Applications


Web 2.0
: Focus on applications,
collaboration and interaction (rather than
pages and browsing)


AJAX
: Asynchronous JavaScript and
XML. Umbrella term for web
development techniques that utilize
background data transfer and scripted
client
-
side applications


ATOM
: XML
-
based web syndication
(feed) format standard nominally
intended to replace RSS


XML
: eXtensible Markup Language


JSON
: JavaScript Object Notation.
Lightweight data structure transmission
format


RSS
: Really Simple Syndication. XML
-
based web syndication (feed) format


REST
: REpresentational State Transfer.
Umbrella term for simple interfaces used
to transfer data via (e.g.) HTTP without
another messaging layer


PHP
: PHP Hypertext Preprocessor.
Server
-
side scripting language that can
be embedded in HTML


Ruby on Rails
: complete framework for
building database
-
backed web
applications with (intended) relative ease


API
: Application Programmer Interface.
Abstraction of the functions, classes, etc.
in a program or library that are available
to other programs


GUI
: Graphical User Interface


URL
: Universal Resource Locator


LAMP

stack: Originally Linux, Apache,
MySQL,
PHP
. Generalized to mean any
solution stack comprised of free/open
source software used to run a web
application server


Zend Framework
: Open source, object
-
oriented web application framework
written in PHP


DB2
: a family of IBM relational database
products

Terms and Acronyms (continued)


MySQL
: a freely available relational
database


Dojo toolkit
: Tools and utilities for
creating AJAX/JavaScript applications.


LOB
: Line of business(?)


ADM
: Augmentation
-
level Data Model


Xquery
: query language for extracting
data from XML documents


XDM
: Xquery Data Model


FLWOR

expression: For, Let, Where,
Order by, Return. Style of Xquery query
that performs projections and joins on
one or more XML sources and returns a
sorted list of results


Closed operator
: Given an operator (or
function or transformation) and a domain
(or set of inputs), the operator is closed
under the domain if for every member of
the domain the result of the operation is
also a member of the domain


MIME types
: Multipurpose Internet Mail
Extensions. A standard and extensible
set of document types used to identify
the content of e
-
mail and html docs


CSV
: Comma
-
Separated Values. A
simple text format for tabular data where
fields (or columns) are delimited by a
comma and records (or rows) are
delimited by a newline character


DOM
: Document Object Model. Standard
model for representing XML documents
as objects


Curl
: Open source tool and libraries for
retrieving remote files and documents via
URL using HTTP, FTP and other
protocols


GNR
: IBM Global Name Recognizer
(phonetic similarity)


EII
: Enterprise Information Integration


ETL
: Extract, Transform, Load. A method
of pulling data into a warehouse


RDF
: Resource Description Framework.
Simple model for describing metadata for
e.g. the semantic web

Motivation


Business leaders want “situational” applications that
use data from many sources, some of them
nontraditional


Web technologies have evolved to allow information
exchange and collaboration using lightweight
standards

web services, Web 2.0, etc.


Damia uses modern web technology to allow
business users to create “mashups” using whatever
data sources they choose

Damia Feed Server Architecture

How does it work?


User specifies sources either as existing RSS/ATOM feeds or
as custom feeds


Custom feeds can be created by uploading documents of a
known type (CSV) or through a data
-
source
-
specific connector


User specifies filters, join conditions and other transformations
using a fixed set of operators


Damia converts the user input to an XML
-
formatted Mashup
specification


On execution the Augmentation engine reads the sources,
performs the mashup operations, and publishes the result as a
new RSS/ATOM feed


Mashup results intended to be requested and consumed by
other applications

Damia Integration Engine

Augmentation model vs. Feed model

Available Operators (from MashupHub)


Source
: Import data to the
mashup


Combine
: Create one feed
from two or more inputs


Filter
: Output only entries from
the input satisfying certain
conditions


For Each
: Place the values
from one operator into the URL
parameter for another operator,
return results


Group
: Organize entries into
categories based on a specified
element


Merge
: Join two inputs based
on certain match conditions


Publish
: Specify output format
of the mashup


Sort
: Re
-
order entries based on
a specified field value


Transform
: Modify entries from
the input based on specified
text or math functions

Usage Scenarios (from paper)


Customer Service


Receive name suggestions from phonetic
similarity matcher (source, transform)


Look up matches in customer service DB (source,
merge (augment))


Adjust output to desired format (transform)


Publish


Usage Scenarios (continued)


Weather Alerts for Insurance Agent


Upload insurance data spreadsheet (source, via
custom feed)


Identify zip codes from spreadsheet (filter)


Import weather alerts from NWS (source)


Compare/match zip codes from spreadsheet w/
zip codes from weather alerts (merge (augment))


Publish formatted list of customers likely to be
affected by severe weather (transform, publish)


Demonstration


https://greenhouse.lotus.com/mashuphub

(free login
required)

Future Work


Data Standardization


Continuous mashups

true mashup
subscription rather than just on
-
demand


Additional data import connectors


In
-
depth search


Data quality (or pedigree)

Questions?