CS 157B: Database Management Systems II

martencrushInternet και Εφαρμογές Web

8 Δεκ 2013 (πριν από 3 χρόνια και 6 μήνες)

109 εμφανίσεις

CS 157B: Database Management Systems II

May 6 Class Meeting

Department of Computer Science

San Jose State University


Spring 2013

Instructor: Ron Mak

www.cs.sjsu.edu/~mak


Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

2

Student Surveys


Don’t forget to do your student surveys online!

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

3

Project #5 Presentations Monday, May 13


Section 1


INVIKO


Leftovers


T
-
Rex



Section 2


Team C


Musicmen


Random


UnlimitedData


Xeon



What is your application?


What are your data sources?


Show with screen shots or live with Composite Studio.


Demo of your client application.

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

4

Unstructured Data


Structured data


Relational databases


tables, rows and columns


database schema


XML


nested hierarchies


XML schema


Unstructured data


No data model (schema)


Might be “semi
-
structured”


Examples


images, videos, audios, and other binary data files


text files, Word documents, PDF files


spreadsheets


executables

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

5

Unstructured Data


How to query unstructured data?


Use
metadata

to described the stored data.


metadata = data (descriptors) about data (what’s stored)


Queries against the stored data are actually

queries against their metadata.



Major challenges:


Define and maintain the
appropriate set of metadata
.


Construct and execute proper queries against the metadata.

_

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

6

Content


Content is all the data (structured and unstructured)
needed for an application.


Example application: multichannel publishing



Multichannel publishing reuses content

for publishing in different channels:


Websites


Printed books, brochures, etc.


Movies and other audiovisuals


broadcast or streamed


DVD


Multiple languages

_

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

7

Content Management System


A
content management system

(CMS)


manages the unstructured data.



Types of content management systems:


Web Content Management (WCM)


Manages content for websites.


Digital Asset Management (DAM)


Manages graphics and multimedia content, not including text.


Document Management (DM)


Manages documents.


Enterprise Content Management (ECM)


Manages all the content for an entire enterprise, including
business documents, emails, etc.


Component Content Management (CCM)


Manages content at the component level.

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

8

Architecture Overview


Oracle CMS

(Content Management System)

Architecture of the

Shot Data Management System

National Ignition Facility

Lawrence Livermore National Lab

Oracle Content

Management System

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

9

Enterprise Content Management (ECM)


Five application areas of ECM:


Document management


Capture, editing, and distribution of office documents and files.


Web content management


Organize an enterprise’s web content and web publishing process.


Image management


Manage the process of scanning, quality control, metadata capture,
and storage.


Digital asset management


Used by creative and marketing professionals to capture, create,
and edit photos, videos, and illustrations.


Records management


Long
-
term archival of documents and records

in compliance with government regulations.

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

10

Shot Data Management Workflows


Large display of many nodes in multiple “swim lanes”


Each node represents a unit of work to be performed.


Each swim lane represents a category of work.


Results flow from one node to the next based on rules.



The display is maintained in real time.


Currently active nodes are highlighted.

SXI Hohlraum LEH Monitor Analysis

Rev
6
-
4
-
07
Instrument Analysis
Diagnostic Analysis
Campaign Analysis
Hohlraum
(
TD
|
Port
|
SXI
|???
)
SXI Integrating Camera
(
TD
|
Port
*
|
SXI
|
SCCD
)
Feature Mapping
(
superimpose LEH
and perform line
-
outs
)
Bad Pixel
(
Hot Pixels
)
(
Saturated Regions
)
Correct CCD Instrument
Written by IPT
Type
:
IPT Production Code
Flat Field
Written by Schneider
Type
:
RS Desktop Code
Separate Regions
Background
Correction
Region Separation
Written by Schneider
Type
:
RS Desktop Code
Map LEH Outline
Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

11

Enterprise Content Management (ECM)


ECM also provides:


Workflow and lifecycle management


Manage the process of coordinating and sequencing the
development of content by different people within the enterprise.


Who worked on a piece of content, when, and what was done.


Quality control


Ensure that each piece of content has a high level of quality.


Version control


Track multiple versions of a piece of content during its lifecycle.


Security



Maintain access rights to a piece of content.


Automated processes


Scale up as the amount of content grows.

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

12

SPLASH Workflow

Obesity Workflow

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

13

ECM Applications


Knowledge management


A repository for an enterprise’s knowledge and experience.


Shared drive replacement


Easy way to access the content repository.


Enterprise portals, intranets, and kiosks


Keep employees and customers up to date with the latest
personalized information.


Information publishing


Web, books, brochures, advertising literature, etc.


Collaborative content development


Manage many simultaneous developers working cooperatively

(workflow, version control, etc.)

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

14

Case
-
Based

Reasoning

SHIP Architecture

Raw data



Integrated information



Knowledge



Analysis

Matching

Engine

Case

base



Action

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

15

Metadata and Content Management


Metadata is the key to content management.



Structural format for the metadata


A
metadata field

is an
attribute
-
value pair


XML format



Agree on common attribute terms


Is it “author” or “creator”?


Does “date” mean creation date or date last modified?

_

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

16

Architecture Overview

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

17

CIP Data Product Navigator

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

18

The SPLASH Platform

SPLASH REPOSITORY

Model and Data Registration

Model, Data, and Mapping
Discovery

Model and Data Composition

Model Execution

Experiment Manager

SPLASH MODULES

Collaborative Reporting and
Visualization

Provide

models and data

Use

models and data

Multi
-
disciplinary users

Metadata

-

Model inputs and outputs

-

Access and execution

-

Data schemas

-

Model and data locations

-

Model and data semantics

contains

Data

Composite
Model

Data

Model

Model

Mappin
g

describes

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

19

Domain
-
Specific Metadata Standards


IEEE Learning Objects Metadata


Educational material


Information Content Exchange (ICE)


Ecommerce and information interchange between companies


Synchronized Multimedia Language (SMIL)


Multimedia applications


ISO/IEC 11179


Department of Justice, Environmental Protection Agency, US Health
Information Knowledgebase, National Cancer Institute


Common Warehouse Metamodel (CWM)


Data warehousing


Library of Congress Digital Repository


1998 pilot project


See
http://www.loc.gov/standards/metadata.html



Dublin Core


Traditional corporate content management

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

20

Domain
-
Specific Metadata Standards


The World Wide Web Consortium (W3C) has the
Resource Description Framework (RDF)



A
standard model

for data interchange on the web.


Specify, identify, and reference
metadata vocabularies


from an XML file.


Use URIs to name the
relationships between components
.


Allow
structured and semi
-
structured data

to be mixed,
exposed, and shared across different applications.


See
http://www.w3.org/RDF/


_


Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

21

Metadata Creation


Automatically generated


“Self aware” attributes


Examples for photo images:


width, height, resolution, color palette


date and time the photo was taken


type of camera


name of photographer


Manually created


Human interpretation of the contents of the image


Examples:


photo of two ducks swimming in a pond


diagram of a particular circuit


document about the circuit design

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

22

Manual Metadata Creation


Define use cases.


Different users will need different sets of metadata fields.


Similar to data marts of a data warehouse.

http://www.cmswire.com/cms/enterprise
-
cms/

understanding
-
oracles
-
universal
-
content
-
management
-

metadata
-
profiles
-
011200.php?pageNum=2

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

23

Content Validation


Some
content validation

can be done automatically.


Example: Does this image have the correct resolution for
display on a web page?


However, humans still do much of the validation.


“Validate early, validate often”

during the content lifecycle.


Example:






In the future,
machine learning algorithms

can enable
more automated validation.

Content

creator

Application

designer



VALIDATED

Content

publisher

End

user



VALIDATED



VALIDATED

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

24

Content Query and Retrieval


Content query and retrieval requires

matching metadata attribute values
.


Exact matches vs. fuzzy matches



Queries take longer if there are more metadata fields.


Therefore, minimize the number of metadata fields for each
type of content.

_

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

25

Example Metadata


Metadata model for an email repository


Based on the
Dublin Core

Domain Model:

Managed Object


Emails

Content Type


Email

Responsible Parties

Sender

Recipient

Priority

Level

Type

Metadata Terms and Fields:

Content Type


Emails (Auto Selected)

Receive Date


(Date Field)

Sent Date


(Date Field)

Subject


(free form field)

Sender


(free form field)

Recipient


(free form field)

Priority Type


(predefined list)


Value 1


General Correspondence


Value 2


Legal Correspondence


Value 3


Sales Correspondence


Value 4


Human Resource Correspondence

Priority Level


(Predefined list)


Value 1


No action required


Value 2


Immediate action required


Value 3


Management action required

http://www.cmswire.com/cms/enterprise
-
cms

/the
-
importance
-
of
-
metadata
-
in
-
content
-
management
-

009746.php?pageNum=2

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

26

Open
-
Source Content Management Systems


Java


Alfresco Community Edition



PHP


Drupal


Joomla


WordPress



Python


Plone

_

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

27

Proprietary Content Management Systems


Java


Alfresco Enterprise Edition


Documentum (EMC)


IBM Enterprise Content Management


Oracle ECM Suite



ASP.NET


SharePoint (Microsoft)

_

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

28

http://plonemetrics.blogspot.com/2009/12/plone
-
centric
-
view
-
of
-
cms
-
technology.html

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

29

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

30

Case
-
Based

Reasoning

SHIP Architecture

Raw data



Integrated information



Knowledge



Analysis

Matching

Engine

Case

base



Action

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

31

Metadata Model vs. Taxonomy vs. Ontology


Metadata model


Metadata is data about data.


A metadata model is a template for a set of metadata fields

for a given application domain.



Taxonomy


A classification of data according to a predetermined system


Provide a conceptual framework for analysis or retrieval.


Organizes content into hierarchical relationships.


Metadata alone doesn’t necessary do this.


Example taxonomy:

Kingdom


Phylum


Class


Order


Family


Genus


Species

Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

32

Metadata Model vs. Taxonomy vs. Ontology


Ontology


Unlike the hierarchical taxonomy, an ontology is a

set of data classifications in the form of a web.


More relationships between classifications

beyond “class” and “sublass”.


Usually specified with an
ontology language

that is

machine
-
readable (i.e., it can be parsed and compiled).


Can provide details about each relationship.

_


Department of Computer Science

Spring 2013: May 6

CS 157B: Database Management Systems II

©
R. Mak

33

Metadata Model vs. Taxonomy vs. Ontology

Taxonomy

Ontology

http://scriptoriumblogorium.blogspot.com/2011/

06/in
-
beginning
-
there
-
was
-
taxonomy.html

http://stick.ischool.umd.edu/innovation_ontology.html