Open Source Methods to Integrate Enterprise Data at OSU

architectgroundhogInternet and Web Development

Dec 4, 2013 (3 years and 10 months ago)

113 views

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Open Source Methods to
Integrate Enterprise Data at
OSU

OCIO, ENG, ASC,
VetMed
, KSA,
etc

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Introduction:
Many Isolated Web Groups

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Introduction:
Reinventing the Wheel

1891 Patent for Wheel

Implementing a Directory


Duplicated Effort

2
-
3 weeks * 50 groups

2
-
3 years of labor


Missing the Target

Divergent Efforts

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Introduction:
Many Data Sources, Policies & Tech

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Introduction:
Reinventing the Data

A faculty member with
profiles/information on at least
8 different sites.


Choose: Stale Information or
Duplicated Effort


8 x 1hr x 20,000 employees

= 80 years labor (worst case)

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Introduction:
Hard to Answer Questions

Do students that took a
course in Java have a
better chance of getting
a co
-
op or internship?

SIS &
Courses

Career

Services


What are Professor X’s
strengths and weaknesses
relative to his department peers
(
publications, grants, teaching
evaluations)?

OSU

Pro

SIS

HR

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Introduction:
How do we…?

Efficiently Work
Together

Connect &
Leverage Data

Answer the Hard
Questions

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Introduction:
Agenda

An information system
to connect the dots.


Tools / Methods for
Collaboration


Drupal Data Integration

to encourage adoption.


THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
The Challenge


Few
official methods exist to pool resources across organizational
boundaries



Why
not
use
drupal.org,
Github
,
S
ourceforge
,
etc
?


Various policies restrict the use of external services


Licensing concerns


Need flexibility


Platform agnostic


THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA


Use various open source tools internally


Git
/
gitweb
/
gitosis


Distributed version
control


Code repositories


Drupal instance with project module


Release management and versioning


Update repositories


Issue tracking


Usage statistics


Shibboleth account
integration

Collaboration:
The Solution

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Version Control

v1.31

CVS

Many Types:

SVN, CVS,
Git
, Windows,
Mac OS

Advantages:


Collaboration, conflict resolution,
undo, versioning, branching

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Git



Open source, distributed
version control system


Every
G
it

clone
contains a full copy of
the current version and
past versions of the
repository


Now the primary VCS used

on drupal.org



THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Git

Workflow



THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Gitweb



http://code.web.engadmin.ohio
-
state.edu

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Gitosis


Gitosis

is an access manager for
G
it

repositories


Light weight


Access control


SSH


Public/private key authentication


M
anaged via
Git


Might also consider
gitolite

Code

SSH

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Get
Git


From the source


Git

project:
http://
git
-
scm.com/


Windows


msysgit
:
http://code.google.com/p/msysgit
/


Mac OS


git
-
osx
-
installer
:

http
://code.google.com/p/git
-
osx
-
installer
/


Linux


Many distributions maintain
git

packages in their software
repository


Gitosis


http://eagain.net/gitweb
/


THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA


Drupal.org provides excellent project management and
source control in a tightly integrated package


Also serves updates for modules and themes



Implemented many
d.o

modules and conventions in an
internal source control site


Allows for easy cross
-
posting to drupal.org



Hosted at
http://source.web.engadmin.ohio
-
state.edu



Currently hosting over 60 projects


Collaboration:
Drupal Source Control

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Drupal Source Control

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA


Customizable


Supports non
-
Drupal projects


Issue Queue


Automatic development builds


Automatic site updates


Drush

make integration


Usage statistics


Easy cross
-
posting to drupal.org


Collaboration:
Drupal Source Control Benefits

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Source Control Modules

Update XML

XML Wrapper

Drupal Site

Release archive files

Git

repositories

Version Control
API

Project

Project
Release

Release package script

XML generation script

1.2

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA








Internet2 web single sign on


Secure, federated authentication


Interfaces with university credential management system


Ties a Drupal user account to an OSU (or other
institution) user account

Collaboration:
Shibboleth

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA


Project module


http://
drupal.org/project/project


Version Control API


http://
drupal.org/project/versioncontrol


Version Control API
--

Git

backend


http://
drupal.org/project/versioncontrol_git


Version Control / Project*
integration


http://
drupal.org/project/versioncontrol_project


Drupal.org
customizations


http://
drupal.org/project/drupalorg



Shibboleth
authentication


http://
drupal.org/project/shib_auth


Collaboration:
Source Control Modules

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Other Efforts


In addition to technical solutions, we have also adopted other “open
source” collaborative practices


THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Collaboration:
Outcomes

Starting slow, but growing


College
of
Engineering


College
of the Arts and
Sciences


College
of Veterinary
Medicine


Office
of the Chief
Information
Officer


Knowlton
School of
Architecture


Health
Sciences Library

Overall, positive

Lots of room for improvement

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

KMData Background


What is KMData?


Need for a central repository
to combine and merge data
from decentralized data silos
across the institution


Open
-
source architecture
enables contributions from
development community


Built using open source
development tools

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

KMData Requirements


Use Cases


Department directories on web
sites


Course listings tied to syllabi


Need centralized data mart
and API to access the
combined information


Data is merged into a core
schema capable of fitting
different types of elements in
a relatively few number of
tables

User Data

User Data

User Data

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data

Sources

ETL’s

Target

Schemas

Core

Schema

Web

Services

Adaptor

API’s

Drupal

API

PHP

API

Ruby

API

Java

API

HR/SIS

Oracle

OSU:pro

MS SQL Svr

Digital Library

MySQL

.NET

API

KMData

Database

PostgreSQL

9.0

on
CentOS

KMData

Core

Schema

HR/SIS

Schema

OSU:pro

Schema

Digital Library

Schema

Kettle

ETL

Kettle

ETL

Kettle

ETL

Pentaho Data

Integration (Kettle)

ETL’s nightly

schedule with
crontab

Merge Stored Procedures (PL/pgSQL)



Kettle

ETL



KMData
Application

Web Service API

written in Java on

CentOS web server

External Data Silos

(various sources)

WS Method

WS Method

WS Method

Drupal API



KMData

Architecture

Focus of Presentation

Direction of Data Flow

Direction of API Calls

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Directory/Profiles as First Use Case

Note: Directories and profiles are 20
-
30% of traffic on some department sites.

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Which Approach is Best?

Feeds

Custom

API

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Feeds at a Glance

Feeds Pros


#1 Data Integration Module


23k Sites Using It


Syncs to CCK or Data Tables


RSS, CSV, XML, OPML,
etc


Extremely Extensible


Feeds Cons


Very Complex Configuration


Can’t Get There From Here



THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Feeds at a Glance



Fetcher

Parser

Processor

File Upload

HTTP Fetcher

SQL

Services



CSV

SimplePie

Xpath

Rss1/2/atom


Node Processor

Taxonomy Term

User

Data

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Amazon API

Amazon Pros


Solves Amazon Product Integration


2.3k sites use it (10% of feeds)


CCK field


Caching API


Easy to Use and Reuse


Black Box Architecture


Feeds Support (
cck

field)


Amazon Cons


Black Box Architecture


Merging Data Difficult



THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Amazon API

Amazon Caching API

My Content Type

ASIN CCK Field

Views

Create
Content Type

Add ASIN
Field

Enjoy

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Feeds
vs

Amazon Style API

Feeds allows a site builder to:


Solve a general class of data sync problems with heavy configuration.


Configure where everything is stored and how it is processed.


The Amazon API modules:


Solve the problem of interfacing with Amazon data. It. Just. Works.


Provide developer friendly encapsulation.


A good approach if you want to promote pluggable reuse of data.

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Drupal Reusable
Components

Data Integration:
Our Basic Approach

KmPerson

API

(structured caching
for views)

KmPerson

CCK Field

KmData

API

(serialized caching)

PHP OO WS

Client

(Library in Drupal)

KmObject

KmPerson

Drupal Directory Feature

Person Content Type

KmPerson

CCK
Field Instance

KmData

Web
Services

Directory View

Management View
(VBO)

Other Features
Goodness, Theme
Functions,
etc

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Basic CCK Field Goodness In Use

1.
Add Field to
Content
Type.


2.
Populate
Field.


3.
Profit!

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Directory Shows More Data in Profiles

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
$#@! That’s not my address!

My address is wrong!


Us: Can you update it in HR?


Them: No!


Other common problems…


I want different contact info on different sites.


I want
different bios on different sites (with different audiences).


I want some feature not supported by the upstream data
source (images)


THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Contexts and Overrides

Override Module(s)


Can override arbitrary values.


Can be automatic, possibly from active directory.


Can be user interactive, allowing user
to manually override
.


Can apply to a class of
KmObject
.


Operate in a stack.


Context Module


A collection of configured overrides applied to a certain context.


Think input formats for
KmObjects
.


Can be associated to multiple uses.

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Contexts

(groups of overrides)

Use Cases

(context: field setting or
api

arg
)

Data Integration:
Contexts and Overrides

Directory Profile

Directory View

User Pages

Directory

(
KmPerson
)

Override

Modules

Manual Field

Override

Active Directory

LDAP



Default

(
KmObject
)

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Context/Override Configuration

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Context/Override Configuration

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Overrides in CCK or Standalone


Multiple ways to
provide override
forms.



Overrides can
identify source
system and non
-
overridden values.


THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Data Integration:
Summary

Simple to Implement

Easy to Extend

Very
Themable


Simple to Use

Easy to Override

Encourages Correcting the Source

THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Closing:
Summary

An information system
to connect the dots.


Tools / Methods for
Collaboration


Drupal Data Integration

to encourage adoption.


THE OHIO STATE UNIVERSITY


DRUPAL CAMP
KMDATA

Closing:
Questions

College of Engineering

College of the Arts and Sciences

College of Veterinary Medicine

Office of the Chief Information Officer

Knowlton School of Architecture

Health Sciences Library


http
://kmdata.osu.edu/

http
://source.web.engadmin.ohio
-
state.edu/projects/kmdata