Why Store Data in XML? VS RDBMS

righteousgaggleData Management

Jan 31, 2013 (4 years and 6 months ago)

141 views

Comparing

RDMS to XML

Why Store Data in XML? VS RDBMS



Today's businesses continually accumulate data, whether it's customer information,
production information, or any of thousands of other categories of information. This data
needs to be stored some place, and better yet, it needs to be stored in a
relational format
for easy retrieval and integration with other data.



RDBMS systems such Oracle, Microsoft SQL Server, MySQL, PostgreSQL, and many others
are used to perform such tasks


when security, speed, and efficiency pose concerns.


RDBMS systems offer a great solution whenever internal or company
-
developed applications
require access to the data.


The potential problem arises either when one application needs to access data of another
application but does not have direct access to that application's database system, or when
data need to be ported between different database systems (especially true if the systems
are from different vendors).


RDBMSs store their data in binary formats that are not directly accessible by other
databases. Any external application wishing to interact with this data needs to utilize a
database Application Programming Interface (API) provided by the RDBMS specifically for
that product or a generalized interface, such as the MS SQL engine

What we will cover

XML from a Data Practitioner’s standpoint

Relational databases and schemas

A comparison of relational and XML architecture

What each technology is good at


and when to use it.

Relational metadata and XML metadata

How to navigate a relational schema vs. an XML schema

Gathering data (queries) in a database vs. an XML document

Differences in flexibility between the two

Cost considerations

Data integrity considerations

So…what do we do?

XML: A Data Practitioner’s View (1)

XML is a specification for designing tag
-
based languages.

The specification allows for:

Metadata (XML Schemas) that define

Valid data structures

Defining of user data types

Valid lists of values, ranges, and patterns

Optionality/cardinality

Elements and Attributes

Reusability of data, data types, and schemas

Creation of instance documents based on an
XML Schema

XML: A Data Practitioner’s View (2)

The potential exists for:

Industries to agree on an XML
-
based language
for data exchange.

Exchange of XML instance documents between
trading partners

An entire industry has grown up around:

Providing XML tools (and repositories)

Ongoing development of standards

XML has gained very wide acceptance!

XML Content

Here is an easy
-
to
-
understand sample of XML:

<?xml version = "1.0" encoding = "UTF
-
8"?>

<DataTransmission xmlns:xsi = "http://www.w3.org/2001/XMLSchema
-
instance"
xsi:noNamespaceSchemaLocation = "PatientSearchResponse.xsd" Source = "CentralPatient" Target = "Store042"
MsgTypeCode = "PSRS" MsgTypeDesc = "PatientSearchResponse" FoundFlag = "true">


<PatientList>



<PatientIdFields>




<FirstName>Maria</FirstName>




<LastName>Montes</LastName>




<Birthdate>1951
-
11
-
05</Birthdate>




<Gender>F</Gender>




<StreetAddress>1969 Ygnacio Valley Road</StreetAddress>




<CityAddress>Walnut Creek</CityAddress>




<StateCode>CA</StateCode>




<ZipCode>94597</ZipCode>




<PatientID>109993345</PatientID>



</PatientIdFields>


</PatientList>

</DataTransmission>

The XML Document(1)

An XML document is:

A self
-
contained set of structured (“tagged”) data

Decoupled from the source systems

Tagged data in the document does not change when
the source system data changes

The document represents a snapshot of the source
system data at a particular moment in time.

Packaged with the metadata (tags) for a specific
business transaction

The XML Document (2)

An XML Document has:

Semantic content

Values (tagged data)

Metadata (the tags)

Order

Hierarchy

Structure

The XML Schema

An XML document is usually (but not
always) validated by an XML Schema.

The XML Schema provides the information
on whether the XML document “followed
the rules” set up in the XML Schema.

An XML Schema is an
agreement

between
the sender and the receiver of a document as
to the structure of that document.

Elements vs. Attributes

Elements:

Basic building blocks of XML

Contain content which can be a structure

Attributes

Specify additional information about an element.

Contain only simple type content

Some data could be either an Element or an
Attribute (so you need standards on how to
decide which to use).

Element & Attribute in XML Schema

Element

and
Attribute

declaration:

<xsd:element name = "DataTransmission">


<xsd:complexType>



<xsd:sequence>




<xsd:element ref = "FirstName" minOccurs = "0"/>




<xsd:element ref = "LastName" minOccurs = "0"/>




<xsd:element ref = "Phone" minOccurs = "0"/>




<xsd:element ref = "Birthdate" minOccurs = "0"/>




<xsd:element ref = "Gender" minOccurs = "0"/>




<xsd:element ref = "StreetAddress" minOccurs = "0"/>




<xsd:element ref = "CityAddress" minOccurs = "0"/>




<xsd:element ref = "StateCode" minOccurs = "0"/>




<xsd:element ref = "ZipCode" minOccurs = "0"/>




<xsd:element ref = "SSN" minOccurs = "0"/>





<xsd:element name = "SafetyCapDate" type = "xsd:date"/>



</xsd:sequence>



<xsd:attribute name = "Source" use = "required" type = "xsd:string"/>



<xsd:attribute name = "Target" use = "required" type = "xsd:string"/>



<xsd:attribute name = "MsgTypeCode" use = "required" type = "MsgTypeCodeType"/>



<xsd:attribute name = "MsgTypeDesc" use = "required" type = "xsd:string"/>


</xsd:complexType>

</xsd:element>

Element and Attribute XML Document

Element

and
Attribute

content:

<DataTransmission xmlns:xsi = "http://www.w3.org/2001/XMLSchema
-
instance" xsi:noNamespaceSchemaLocation
= "PatientSearchRequest.xsd"
Source = "Store599" Target = "CentralPatient" MsgTypeCode = "PSRQ"
MsgTypeDesc = "PatientSearchRequest"
>


<FirstName>Maria</FirstName>


<LastName>Montes</LastName>


<Birthdate>1951
-
11
-
05</Birthdate>


<Gender>F</Gender>


<StreetAddress>1969 Ygnacio Valley Road</StreetAddress>


<CityAddress>Walnut Creek</CityAddress>


<StateCode>CA</StateCode>


<ZipCode>94597</ZipCode>


<SSN>561
-
88
-
9208</SSN>


<SafetyCapDate>2001
-
05
-
22</SafetyCapDate>

</DataTransmission>

Simple data types in an XML Schema

Comes with “atomic” simple data types

Integer, boolean, date, decimal, string, etc.

You can build user
-
defined simple data types

Built on the included “atomic” data types

Allows declaration of

valid values, ranges, Patterns, Length, total digits

And more…

Attributes or Elements can be of a simple data
type (either atomic or user
-
defined).

Simple data type examples

<xsd:simpleType name = "SevenPlaceInteger">


<xsd:restriction base = "xsd:integer">
builds on atomic simple data type



<xsd:totalDigits value = "7"/>


</xsd:restriction>

</xsd:simpleType>

<xsd:simpleType name = "GenderType">


<xsd:restriction base = "xsd:string">



<xsd:enumeration value = "M"/>



<xsd:enumeration value = "F"/>



<xsd:length value = "1"/>


</xsd:restriction>

</xsd:simpleType>

<xsd:simpleType name = "RelationshipCodeType">


<xsd:restriction base = "xsd:string">



<xsd:enumeration value = "self"/>



<xsd:enumeration value = "spouse"/>



<xsd:enumeration value = "dependent"/>



<xsd:enumeration value = "other"/>


</xsd:restriction>

</xsd:simpleType>

<xsd:simpleType name = "SevenPlacePositiveInteger">
builds on custom simple data type



<xsd:restriction base = "SevenPlaceInteger">



<xsd:minInclusive value = "0"/>


</xsd:restriction>

</xsd:simpleType>

Complex data types in XML Schema

Builds a structure of Elements.

Each subelement is either a simple data type or
another structure of Elements.

Only Elements can be of a complex data type.

Can be named and reusable or anonymous and
used only by a single Element.

Can be an extension or restriction of another
complex type.

Complex data type examples

<xsd:complexType name = "AddressType">
declaration of named complex data type


<xsd:sequence>



<xsd:element ref = "StreetAddress"/>



<xsd:element ref = "CityAddress"/>



<xsd:element ref = "StateCode"/>


</xsd:sequence>

</xsd:complexType>

<xsd:element name = "WorkAddress" type = "AddressType"/>
association of Element with named complex data type


<xsd:complexType name = "AddressWithCountryType">
new complex data type extends existing complex data type


<xsd:complexContent>



<xsd:extension base = "AddressType">




<xsd:sequence>





<xsd:element name = "CountryCode" type = "xsd:string"/>




</xsd:sequence>



</xsd:extension>


</xsd:complexContent>

</xsd:complexType>

<xsd:element name = "PatientInsurance">
element with anonymous complex data type


<xsd:complexType>



<xsd:sequence>




<xsd:element ref = "Patient"/>




<xsd:element ref = "TPMembership" minOccurs = "0" maxOccurs = "unbounded"/>



</xsd:sequence>


</xsd:complexType>

</xsd:element>

Using the XML Schema

Source

database

Extract

program

Data

XML

Schema

XML

Document

Target

database

Parse

program

XML

Schema

Data

Network

Reusing XML Schemas

XML Schemas can
build on each other to
provide reusability.

Statecode.xsd

Base

Definitions.xsd

<xsd:include schemaLocation =
"StateCodes.xsd"/>

<xsd:include schemaLocation =
"BaseDefinitions.xsd"/>

<xsd:include schemaLocation =
"BaseDefinitions.xsd"/>

<xsd:include schemaLocation =
"BaseDefinitions.xsd"/>

Patient

Search

Request.xsd

Patient

Search

Response.xsd

Patient

Update

Request.xsd

An XML Schema example

<xsd:element name = "Patient">


<xsd:complexType>



<xsd:sequence>




<xsd:element ref = "PatientID"/>




<xsd:element ref = "FirstName"/>




<xsd:element ref = "LastName"/>




<xsd:element ref = "Birthdate"/>




<xsd:element ref = "Gender"/>




<xsd:element name = "PrivacyInd" type = "xsd:boolean"/>




<xsd:element ref = "StreetAddress"/>




<xsd:element ref = "CityAddress"/>




<xsd:element ref = "StateCode"/>




<xsd:element ref = "ZipCode"/>




<xsd:element ref = "EMailAddress" minOccurs = "0"/>




<xsd:element ref = "SSN" minOccurs = "0"/>




<xsd:element name = "HIPAANotifInd" type = "xsd:boolean"/>




<xsd:element name = "SafetyCapInd" type = "xsd:boolean"/>




<xsd:element name = "SafetyCapDate" type = "xsd:date"/>




<xsd:element ref = "StatusCode"/>




<xsd:element ref = "Doctor" minOccurs = "0"/>




<xsd:element ref = "Phone" maxOccurs = "unbounded"/>




<xsd:element ref = "PatDrugAllergy" minOccurs = "0" maxOccurs = "unbounded"/>




<xsd:element ref = "OtherDrugTaken" minOccurs = "0" maxOccurs = "unbounded"/>



</xsd:sequence>



<xsd:attribute name = "PrivacyInd" type = "xsd:boolean"/>


</xsd:complexType>

</xsd:element>

The Structure of an XML Schema


Elements in an XML Schema
are hierarchical.


To expand the hierarchy with
this tool (Tibco’s XML
Authority), click here.

Expanding an Element

This is the result you
get


you can now see
the elements that
make up the structure
of the the
OtherDrugTaken

element.

Managing XML Schemas

Avoid chaos by managing XML metadata
across the Enterprise:

Create reusable base definition schemas

Create and document:

Widely used elements (with their attributes)

Complex data types

Simple data types

Keep track of which schemas use other schemas

Keep track of which documents are validated by
which schemas (XML repository).

XML Architecture

XML is hierarchical:

Patient

Insurance

Prescription

Doctor

Drug

Fills

Doctor

Drug

Claim

This hierarchy is
useful for starting
with a patient and
finding all the
information about
them


such as a
list of their
prescriptions and
when the
prescriptions were
filled.

XML Hierarchy Revisited

The Hierarchy can change depending on
what the XML document is used for.

Prescription

Drug

Fills

Doctor

Drug

Claim

Patient

This version of the hierarchy is
useful for starting with a
Prescription and finding all the
information about it, including the
Patient and Drug.

Each version of
the hierarchy
serves a different
purpose

Database Architecture

Database architecture is relational:

Normalized to eliminate data redundancy

Join on any two columns that have the same
data type.

Foreign keys can enforce data integrity

Relational Metadata


the Schema

Relational metadata is stored in the database

Database control tables fully define the structure
of the database.

Without the DBMS metadata the contents of the
database are worthless.

Completely self
-
contained (not reusable)

Tables are structured, each column is a “bucket”
for a specific kind of data

In most databases, the metadata does not include
descriptions, so a Data Dictionary is necessary.

XML Metadata


the Document

Metadata built into the document

Every element has a tag to tell you where the
data is stored in the document.

Descriptive tags give structure to the document
and tell you what the data means (sort of).

“Sort of” because it only tells the tag name, so
this only has meaning to someone who already
understands what the element or attribute
means.

Document cannot be parsed for storage on its
own. What else is needed?…

XML Metadata


the Schema

An XML Schema (or DTD) is needed to:

Provide standardization (basis of agreement)

Allow meaningful parsing and data storage

Specify agreement on document structure

A data dictionary is still necessary to
provide definition for Elements and
Attributes

Without an XML Schema, a document is
essentially only good for transmitting
blocks of data for humans to read.

Comparing XML to RDMS Metadata

An XML Schema establishes the valid structure of an
XML document, like a database schema establishes the
valid structure of a database.

Database Architecture

XML Architecture



Data Model



Cardinality



Optionality



Data Type



Describes Relationships



Has Metadata



Generate DDL



XML Schema



Cardinality



Optionality



Data Type



Describes Relationships



Has Metadata



Valid XML



Sequence of Elements





Final Result: Database


Final Result: Valid XML Document





What are relational databases good at?

Data Storage

General purpose data storage and retrieval

Used for many purposes, such as queries and
analysis

Generalized view of data for shared use

Ideally shared across business units or the
Enterprise

Works well to store the contents of an XML
document
.

What is XML good at?

Data Exchange

Exchange of data in a document

Usually designed for a specific communication

Works well to move data between databases

Important when source or target database is outside
your firewall.

Usually don’t have direct access to such databases.

Works well (with style sheets or XSLT) to
display data on the web because browsers
inherently are designed to display documents.

When do I use each technology?

RDBMS

Store data

Query data

Mine data

Create generalized reports

XML

Transmit data

Exchange data with
outside agencies

Replace flat files

Create specific reports

Which one?

Use them together

Each for its own purpose

Build an infrastructure that:

Creates XML document (using XML schemas) from database contents.

Parses XML documents to store their contents in a database

Relationships: RDBMS

RDBMS

Explicit table to
table: if not declared
by a foreign key, it
doesn’t exist.

Patient
PK
PatientID
SERIAL
BirthDate
DATE
Gender
CHAR(1)
FirstName
CHAR(25)
LastName
CHAR(35)
PrivacyInd
CHAR(1)
StreetAddress
CHAR(50)
CityAddress
CHAR(35)
StateCode
CHAR(2)
ZipCode
CHAR(9)
HIPAANotifInd
CHAR(1)
EMailAddress
VARCHAR(70)
SSN
CHAR(9)
SafetyCapInd
CHAR(1)
SafetyCapDate
DATE
StatusCode
CHAR(2)
FK1
DoctorID
INTEGER
ThirdPartyInsurance
PK
TPInsuranceID
VARCHAR(20)
AcceptAssignFlag
CHAR(1)
AcceptSecondBillFlag
CHAR(1)
BillMedcrFirstFlag
CHAR(1)
BillingMethodCode
INTEGER
CarrierTypeCode
CHAR(2)
DrQualifierTypeCode
CHAR(3)
ExpireDate
DATE
GroupCode
VARCHAR(15)
MbrIDLengthNum
SMALLINT
MedicaidStateCode
CHAR(2)
NCPDPTypeCode
CHAR(3)
PayorIDTypeCode
CHAR(2)
PharmacyIDType
CHAR(10)
PlanCode
CHAR(8)
PlanName
VARCHAR(20)
ProcessorTPIDCode
CHAR(10)
ProgramCode
CHAR(5)
ProgramName
VARCHAR(20)
RefrlPhoneNum
VARCHAR(20)
ReminderFlag
CHAR(1)
RoutingCode
CHAR(6)
TPIDNumber
CHAR(10)
TPMembership
PK
TPMembershipID
SERIAL
CardholderLastName
CHAR(35)
CardholderFirstName
CHAR(25)
CardholderRelationshipCode
CHAR(2)
CardholderID
VARCHAR(20)
CopayAmt
DECIMAL(8,2)
InsCoverageType
SMALLINT
IssueDate
DATE
MembTypeCode
SMALLINT
TerminationDate
DATE
FK1
PatientID
INTEGER
FK2
TPInsuranceID
VARCHAR(20)
has membership by / is for
is for / is in
Relationships: XML

XML

Implied
positioning:

Part
-
of relationship
implied by
positioning

Parallel elements
have same
relationships as
sibling elements

<Patient>


<PatientID>10988453</PatientID>


<FirstName>Maria</FirstName>

<LastName>Montes</LastName>


<Birthdate>1951
-
11
-
05</Birthdate>


<Gender>F</Gender>


<PrivacyInd>false</PrivacyInd>


<StreetAddress>1969 Ygnacio Valley Road</StreetAddress>


<CityAddress>Walnut Creek</CityAddress>


<StateCode>CA</StateCode>


<ZipCode>94597</ZipCode>


<HIPAANotifInd>true</HIPAANotifInd>


<SafetyCapInd>false</SafetyCapInd>


<SafetyCapDate>2000
-
01
-
01</SafetyCapDate>


<StatusCode>A</StatusCode>


<Phone>



<PhoneID>45569009</PhoneID>



<AreaCode>925</AreaCode>



<PhoneNumber>5556964</PhoneNumber>



<PhoneTypeCode>RS</PhoneTypeCode>


</Phone>


<PatDrugAllergy>



<PatDrugAllergyID>43325564</PatDrugAllergyID>



<InformedDate>2001
-
01
-
01</InformedDate>



<DrugID>432678945</DrugID>


</PatDrugAllergy>

</Patient>

Querying: XML vs. RDBMS

Relational

Easy to build queries by:

Navigating the database joins …
or



Creating joins on any pair of tables that share
matching columns

XML

Difficult to build queries because:

Must stick to the structure of the document

Pointer
-
based navigation is restrictive

Navigating in XML for Queries

Must follow the structure of the document.

Example: DOM Node navigation:

getFirstChild

getLastChild

getPreviousSibling

getNextSibling

getParentNode

Structure Flexibility : Relational vs. XML

Relational: Rigid structure

Lots of work to change the structure because of
foreign keys, Views, Stored procs, and triggers

Plus programs that assume the database looks a
certain way!

XML: Flexible structure

Easy to change the structure of documents to
account for new data, data type changes

Everything is text so it is easy to implement

Or is it?

XML Flexibility

Changing the structure of an XML document is
easy.

Changing the XML Schema is easy too, but have
you considered that you must:

Get agreement from business partners

Change the programs that create documents from the
database and parse documents for the database.

Version your XML Schemas if you want to be able
to open and parse historical documents (do you keep
them?).

Data Integrity: Relational vs. XML

Both provide ways to control data types,
ranges, patterns, valid values, and
optionality.

But XML does not provide data validation
via look
-
up as relational does with foreign
keys.

However, this lack does make the XML
Schema a less rigid structure than a
database schema.

Cost Considerations

People always ask


which is cheaper to implement?

XML is essentially all text, so it is easy to create, change
and manage.

XML doesn’t need the infrastructure that an RDBMS
does.

XML tools are cheap ($99!)

But does this cost difference really matter?

The two technologies do different things.

You can’t use one as replacement of the other.

If you need it, you need it.

So, XML is cheaper


and it doesn’t really matter.

So what do we do?

Do you remember the previous slide?

The two technologies do different things

They complement each other:

RDBMS stores data in a generally usable way

XML exchanges data in a specific way is easy to
display on the web.

They both present challenges for metadata
management.

So…we use them both for the appropriate
purpose.

Thank you!