Bridging the Gap between Data Models and Implementations: XMI2SQL

chunkyscreechServers

Dec 4, 2013 (3 years and 7 months ago)

115 views

Bridging
the Gap between Data Models and
Implementations
:

XMI2SQL







Eric Raymond Hartford







A
project

submitted in partial fulfillment of the requirements for the degree of







Master of Science







University of Washington Tacoma


2004







Program Authorized to Offer Degree:

Computing and Software Systems


University of Washington

Tacoma


Abstract


Bridging the Gap between Data Models and Implementations:

XMI2SQL


Eric Raymond Hartford


Chair of the Supervisory Committee:

Sam Chung, Ph.D.

Assistant Professor

Computing and Software Systems


The purpose of this project is to produce a tool that can generate relational tables
in Structured Query Language (SQL) from a given model in Extensible Markup
Language (XML) Metadata Interchange (XMI).
One of the challenges of
using
modeling techniques

for database design

is the
difficulty of

translating
a database
model
into

relational tables.
The purpose of
XMI2SQL

is

to reduce the work
required for translation
. As its input format, an XML format cal
led XMI
is
employed
. XMI is the standard data interchange representation for Unified
Modeling Language (UML)

models
. As its output format, XMI2SQL
use
s

SQL,
which is the de facto standard for data definition. The software
is

interfaced as
both a console
application and a web service. Validation of XMI2SQL
was
accomplished
using a supply chain management (SCM) data model provided by
the Association of Retail Technology Standards (ARTS). XMI2SQL has two
advantages over the current legacy SQL generators fo
r specific modeling tools.
First,
XMI2SQL

is compatible with any modeling tool that generates XMI from a
UML database model, because
it

uses an open and standards
-
based format for
input. Second, because web services are used for the APIs of the SQL gener
ator,
the APIs can be easily integrated with other applications regardless of the
platforms and the programming languages used.


i

Table of Contents

LIST OF FIGURES

................................
................................
................................
................................
.....
II

LIST OF TABLES

................................
................................
................................
................................
.....

III

GLOSSARY

................................
................................
................................
................................
................

IV

SECTION 1
-

INTRODUCTION

................................
................................
................................
.................

1

SECTION 2


OVERVIEW OF CHALLENG
ES

................................
................................
......................

2

C
HALLENGE
1



UML

IS BAD AT MODELING D
ATABASES
.

................................
................................
................

2

C
HALLENGE
2



SQL

IS NOT ST
RICTLY FOLLOWED BY
DBMS

SYSTEMS
................................
............................

2

C
HALLENGE
3



N
OT ALL DATABASE DESI
GNERS USE THE SAME H
ARDWARE
/

SOFTWARE
.

................................

3

SECTION 3


DEVELOPME
NT PROCESS

................................
................................
.............................

3

P
HASE
1



R
EQUIREMENTS
G
ATHERING

................................
................................
................................
........

3

P
HASE
2



D
ESIGN
/

A
NALYSIS

................................
................................
................................
.......................

6

Selecting Data Profile

................................
................................
................................
............................

6

Parsing Input

................................
................................
................................
................................
.........

6

Generating Output

................................
................................
................................
................................
.

6

P
HASE
3

-

I
MPLEMENTATION

................................
................................
................................
.........................

6

P
HASE
4



T
ESTING

................................
................................
................................
................................
......

7

SECTION 4


CONCLUSIONS

................................
................................
................................
...................

8

ii

List of Figures

iii

List of Tab
les


Table 1
-

Software used in design process of XMI2SQL

................................
...................

4

Table 2
-

Standards Summary

................................
................................
.............................

5

iv

Glossary



ANSI

American National Standards Institute


the USA standards
organization


API

Application Program Interface


an interface between dissimilar
programming languages / paradigms.


C#

Object oriented programming language for Microsoft .NET
environment. .NET pr
ovides excellent Web Service support.


DBMS

Database Management System


implementation of a data model.


IIS

Internet Information Services


Microsoft’s
flexible internet

server
software




ISO

International Standards Organization


the European standards

organization.


HTML

Hypertext Markup Language


language of the World Wide Web.


HTTP


Hypertext Transfer Protocol


transmission protocol for HTML
.


OMG

Object Management
Group


group responsible for UML and other
popular software related standards.


RD
B

Relational Database



database that implements the relational data
model.


SAX

Simple API for XML



a popular
API

for XML parsing
.


SCM

Suppl
y Chain
Management


coordination of product distribution.


SQL

Struct
ured Query Language


standard language for

communicating
with DBMS.


SQL/XML

A.k.a. SQLX



upcoming XML standa
rd for encoding RDB.


SQLX

See SQL/XML.


v

UML

Unified

Modeling Language


diagram based standard especially
suited for describing software systems.


UML2SQL

Open source solution for applyin
g UML mode
ls to RDBs.


W
3C

World Wide Web Consortium


group responsible for maintaining
the HTML and XML standards.


XMI

XML Metadata Interchange



standard methodology for encoding
UML diagrams in XML format.


XML

Extensible Markup Language


text based
hierarchical data format.
Popular with internet applications.


XOM

XML Object Model


open source API to simplify SAX
programming.



Zsqlml

An open source XML standard for encoding RDB.

1


Section 1
-

Introduction



Data modeling has become a pervasive p
ractice in the
database

industry.
Few industrial strength database solutions are crafted without mapping the
database before hand.
Those that fail to properly plan their data solution often end
up paying the price in lost productivity and extensive redes
ign.

As database
technology and design methodology matures, there is a growing need for modeling
techniques to aid in database design [12].

The product of
the data modeling

effort is a specification describing the
database before it has been implemented
.
Usually this begins as a conceptual,
high level Entity
-
Relationship diagram, and then is transferred to a more detailed
relational
Unified Modeling Language (
UML
)

diagram. The database community
has embraced UML, designed for communicating object orien
ted software
concepts, as its modeling standard. Since database modeling was not in mind
during its design, UML is not entirely suited to the task of being the standard
modeling language for database applications.

Unfortunately, current design tools foc
us on the modeling itself, and do not
make it easy for the model to be transferred into implementation.
Currently, both
the modeling software and the database management system accomplish
conversion with tightly coupled systems that are proprietary in na
ture [7].

Because software vendors control the conversion process, they also control
the formats to which the user is permitted to export their data. Software vendors
are economically motivated to use their
influence

to force users to stay with
products

that will allow the vendor to gain profits. As a result, users lose the
p
ower to convert their data into

formats

they would prefer to use
.

This practice
removes power from the user, and gives it to the CASE and DBMS vendors.


A solution employable for
all modeling and database systems would
empower users, and would permit them to more easily move their information to
different systems.

For these reasons, t
here is increasing need for a method to
easily convert models into implementations

using open stan
dards that put the
choices back into the hands of the users
.

To address these issues, a software tool
has been developed

with the goal of
generating relational tables in Structured Query Language (SQL) from a given
model in Extensible Markup Language (XM
L) Metadata Interchange (XMI). This
tool
has

be
en

named XMI2SQL. XMI2SQL decouple
s

the conversion from the
modeling and database systems, providing power to users to decide which system
they will choose to implement. XMI2SQL use
s

open standards to ensur
e optimal
compatibility.

2


Section 2


Overview of Challenges


Overcoming the challenges faced by XMI2SQL has

involved

an effort to
determine or establish a standard methodology for the representation of Relational
Databases (RDB) in UML, as well as to dev
elop an application and web service
that aid
s

in the implementation of models by generating the appropriate SQL
commands for the implementation. In addition, a real life example database
needed to be used to test the software.


Challenge

1


UML is

bad

a
t

modeling databases.

The Unified Modeling Language (UML) is the de facto standard for
modeling object
-
oriented software, the function for which it was designed. UML
was not initially intended for modeling database designs. However, UML is often
used to
model databases because of the need for such models and the lack of
database
-
specific tools and standards [1]. This non
-
formalized methodology for
modeling databases presents a problem for the automated implementation of such
models.

To solve this issue,
a formalized methodology needed to be invented to
force UML to encode central database concepts in a consistent manner. Two such
systems currently exist


the Rational Data Profile, and the Agile Data Profile.
After weighing their relative benefits, the
Agile Data Profile was chosen as the
standard for XMI2SQL.




Challenge

2


SQL is

n
ot strictly followed by
DBMS systems

Structured Query Language (SQL) is the standard language for sending ad
hoc commands and queries to database management systems. SQL
can also be
used to represent data tables, even though it was not originally designed to do so.
To accomplish this

representation
, the SQL commands used to create the table
structure are saved to a text file. These files are usually named with an extensi
on
of “.sql”. The problem is, Database Management System (DBMS) designers do
not strictly follow the ANSI/ISO SQL:1999 standard. Every DBMS is different; a
standards
-
compliant SQL command may work on one DBMS but not another [14].

To solve the
se
, XMI2SQL

only use
s

commands that are available in the
majority of DBMS systems.
In the future
, XMI2SQL
may

also include
mechanisms for the
user to choose an output format to use more platform specific
commands.


3


Challenge

3


Not all database designers use the s
ame hardware / software.

One additional problem is the lack of a common computing platform.
Different businesses have different modeling conventions. They may have
differing operating systems, or file systems. They all have different processes, and
so t
he need exists for a single solution for conversion from any UML modeling
system on any computer system to any DBMS on any computer system.

To address
the cross
-
platform issues, XMI2SQL uses web services that
operate over Hypertext Transfer Protocol (HTTP)
. Virtually all modern
computing systems are able to communicate via HTTP. Furthermore, XMI2SQL
use
s

open standards that apply to all available computing environments.


Section 3


Development Process


XMI2SQL

was

executed in four phases: requirements g
athering,
design/analysis, implementation, and testing.
Requirements gathering and design
consumed the largest amount of time, while implementation and testing required
relatively little time.


Phase 1


Requirements Gathering

Three steps
have

be
en

taken
to accomplish the requirements gathering
phase: determining the proper formatting of the XMI input file, creating a
methodology of conversion, and defining the proper form
atting of the SQL output
file. Before the design effort began, there was a need to
gain a sense of the
technology that was currently available. A summary
of software used to help with
development of XMI2SQL
is indicated on Table 1.


Software


Author

Primary Resource

Description

Internet
Information
Services

(IIS)

Microsoft

http://www.
microsoft.com/windowsserve
r2003/iis

Provides web server for publishing
web services.

NET
Framework

Microsoft

http://
msdn.microsoft.com/netframework

Provides the common framework
library available to all Windows
computers. Provides C# language
runtime, XM
L parsing libraries, and
Web Service libraries.

UML2SQL

Hannes
Kegel and
Ingo
Kegel

http://sourceforge.net/projects/uml2sql

UML2SQL solves the problem of
conversion from XMI to SQL, and
applies the SQL commands to a
DBMS server. Parts of this code
may be

usable to reduce
development time.

4


Visual
Studio
.NET

Microsoft

http://msdn.microsoft.com/vstudio/

Provides environment for .NET
software development.

Zsqlm
l

Zenark
Ltd.

http://sourceforge.net/projects/zsqlml

Zsqlml is an XML markup language
for SQL. I
t includes tools for
conversion of XML to SQL. There
may be useful algorithms in this
code that will be applicable to
XMI2SQL.

Table
1

-

Software used in design process of XMI2SQL


In addition to software tools, several standards
needed to be understood as
background knowledge. These include Extensible Markup Language (XML),
Unified Modeling Language (UML), XML Metadata Interchange (XMI),
Structured Query Language (SQL), and SQL/XML.

Table 2 contains a summary
of the standards XM
I2SQL conforms to.

XML is a hierarchical, verbose, plaintext data format [5]. Primary among
its many uses is the conversion of data into a standard format for transport
between systems [4]. It is generally not the most efficient way to store and
manipulat
e data, though for small applications it can be acceptable. XML can be
used to represent data in a database [3]. XML is maintained by the World Wide
Web Consortium (W3C).

UML is a standardized modeling language for modeling software systems.
UML has t
he capability of modeling at several levels, from highly abstract to
highly defined software systems. UML is also the de facto standard for data
modeling, even though it is not specifically designed for that function [8]. UML is
maintained by the Object
Management Group (OMG).

XMI is a standard for the encoding of UML models into an XML format
[13]. It is a way to encode UML diagrams in a standardized data format either for
transmission over a network, or for conversion from one software system to
anot
her. It is not designed to be the most efficient means of storing and accessing
UML data, but rather as a medium for transfer between systems, which will then
encode the structure in a more efficient binary format. XMI is maintained by
OMG.

SQL is a lang
uage for sending ad hoc commands and queries to a database
[2]. It can be used to create databases and tables, add or edit data in the tables,
specify relationships, and perform query operations on the database. Though SQL
was not originally intended to
be a storage format for database information, that is
how XMI2SQL will be using it [14]. In particular, XMI2SQL will be using
SQL99, because it is widely supported. SQL, while strictly defined by ANSI/ISO,
5


is not strictly adhered to by the database indus
try. In addition, there is no central
repository for SQL information.

SQL/XML, also known as SQLX, is an upcoming ANSI/ISO standard for
encoding relational databases in an XML format [6]. This format is designed to be
a transfer medium for RDB data, unl
ike SQL. [9] Also unlike SQL, this format is
being accepted by the database software community at large, and will be followed
explicitly [6]. Because all databases will support it, this format will eventually be
the best method of encoding database tables
for transfer between systems, but that
time has not yet arrived. Therefore, if XMI2SQL supports this format, it will be in
addition to SQL.



Standard

Maintainer

Resources

Extensible
Markup
Language
(XML)

World Wide
Web
Consortium
(W3C)

W3C’s XML page

http://www.w3.org/XML/


W3C’s XML Specification

http://www.w3.org/TR/REC
-
xml/


XML FAQ

http://www.ucc.ie:8080/cocoon/xmlfaq

Unified
Modeling
Language
(UML)

Object
Management
Group (OMG)

OMG’s UML Introduction:

http://www.omg.org/gettingstarted/what_is_u
ml.htm



OMG’s UML Page:

http://www.omg.org/uml/



OMG’s UML Specification:


http://www.omg.org/cgi
-
bin/apps/doc?formal/03
-
03
-
01.pdf

XML
Metadata
Interchange
(XMI)

OMG

OMG’s XMI
P
age
:
http://www.omg.org/technology/documents/formal/xmi.htm

Structured
Quer
y
Language
(SQL)

American
National
Standards
Institute
(ANSI) /
International
Organization
for
Standardization

(ISO)

SQL FAQ

http://epoch.cs.berkeley.edu:8000/sequoia/dba/montage/FAQ/SQL_TOC.html

SQL/XML

ANSI/ISO

SQL/XML page

www.sqlx.org


SQL/XML specifi
cation (draft)

http://sqlx.org/5wd
-
14
-
xml
-
2002
-
08.pdf

Table
2

-

Standards Summary

6




To help find the proper formatting of XMI and determine the methodology
for conversion, similar projects were researched and identified and dissect
ed.
These included Zsqlml and UML2SQL.
By studying these open source
implementations, the Principle Investigator (PI) was able to gain an understanding
of strategies for approaching the problem.


Phase 2


Design / Analysis

During the design/analysis pha
se design
o
f the conversion algorithm
has

take
n

place. The implementation phase include
d

the creation of the core class as
well as the deployment of the web service interface. Finally, the testing phase will
involve the conversion of a very large concept
ual data model into a working
implementation using the developed tool.


Selecting Data Profile


Parsing Input


Generating Output




Phase 3
-

Implementation

XMI2SQL will be written in Java, so that it can be maintained by other students
who are familiar wi
th Java, and so that it will be cross platform compatible. Java
is an object oriented language with libraries available for dealing with XML, and it
has very good string manipulation capabilities. Speed is not an issue for this
application, as it does no
t deal with very large amounts of data or time
-
consuming
algorithms.

The conversion algorithm should be of order no larger than O(N log N). It is a
one
-
to
-
one conversion which suggests an order of O(N). However, because XML
is a hierarchical language a
nd it will be in a tree
-
like structure, sorting, searching,
and arranging elements may push it up to O(N log N). A O(N^2) algorithm will
be used only as a last resort.

XML Object Model (XOM) will be used as a front end to the Xerces Simple API
for XML (SA
X) parser to parse the incoming XMI file, and will convert the XMI
file into an object
-
oriented representation of the data tree. Then XMI2SQL’s
conversion algorithm will decide whether the file is valid, and if so, how to
convert it into either a standard
ized format or directly into SQL commands.


7


At least two formats already exist for standardizing XML representation of RDBs.
One is called Zsqlml. This format is an open source project that provides
conversion programs and other utilities. The Zsqlml pr
oject was put out for use
during the interim in which there is no industrial standard. The other format is
called SQL/XML. It is currently in development by the H2.3 Task Group,
formally the SQLX Workgroup. This format has the support of IBM, Microsoft,
Oracle, Northrop Grumman, and Sybase, and will be supported by all the major
database applications in the coming years [10]. The purpose of using an
intermediate RDB
-
XML format would be to make the tool as extensible as
possible and to allow for code reus
ability.

This project will involve coding the implementation classes in Java Standard
Edition, and setting up a web service on a publicly accessible server. The service
will be tested under high load conditions. Use of a computer that is connected to
the

Internet will be required, and this resource will be leveraged from existing
program resources at UWT. The project will require the use of the java compiler
published by Sun Microsystems, Inc., which is distributed without charge. The
application will r
un on computers that are currently available for student use. The
Association for Retail Technology Standards (ARTS) has created a standard data
model called Retail Data Model (RDM) that is used for retail supply chain
management. This complex model will

be used to test the capabilities of
XMI2SQL.


Phase 4


Testing


SCM is the computerized management of the process of transportation of products
from supplier to consumer. Bertoline and Wiebe (2002) provide the following
definition:


An electronic altern
ative to the traditional paper chain, providing companies with
a smarter, faster, more efficient way to get the right product to the right customer
at the right time and price. Combines the power of the Internet with the latest
technology, enabling partici
pating suppliers to access up
-
to
-
date company
information and enabling companies to better manage and track supply and
demand. [9]


SCM provides a well
-
developed model with which XMI2SQL can be tested.


8


Section
4



Conclusions


9


End notes

10


Bibliography

11


A
ppendices