DATABASE PROCESSING CONCEPTS

clutteredreverandΔιαχείριση Δεδομένων

31 Οκτ 2013 (πριν από 4 χρόνια και 12 μέρες)

110 εμφανίσεις

Copyright

Jay Lightfoot, Ph.D.

1

B
ACS 485











485_Intro
.doc

11/01/13 7:27 AM


DATABASE PROCESSING CONCEPTS


I. PRIMARY POINTS OF LECTURE SERIES



View data as a valuable corporate resource



Be familiar with the concept of IRM (information resource
mgt)



Basic components of database system



Comparison of File Processing and Database Processing



Advantages/Disadvantages of File and Database Processing



Costs and Risks of DBMS approach


II. DATA AS A CORPORATE RESOURCE


In the
future
, those companies that
succeed will be those that treat data as a valuable corporate
resource. In the past data was thought of as the private domain of functional

areas. This was a
natural out growth of current technology. More formally stated, this is the concept of
IRM
...


A
. Information Resource Management


2
5 years ago this aspect of database was not taught. You just jumped right into the technical
aspects. Now, not so. Now, database must be viewed in the larger context as part of the
corporate information resource. On
e of the primary purposes of information is to support
managerial decision making. Success of company depends on the quality (and speed) of
management decisions. This is a fairly new way to view database. Greatly affects the way
systems are designed and

thought of.


The main point is that you need to design databases within the overall context of the company
(no longer can you ignore corporate structure and build strictly to functional specifications).


III. FILE VERSUS DATABASE PROCESSING ENVIRONMENTS


Current Database systems are the end result of 40 years of evolution. Knowing the history of
computer systems helps to understand why things are as they are and where they may be going.
Also it helps avoid re
-
making the same mistakes again.


Copyright

Jay Lightfoot, Ph.D.

2

A. Brief
History


In the beginning there were no computers. Businesses relied on paper systems and written
procedures. Some very clever methods were devised (and are still used today). The problem
with these systems were that they: (partial list)



could not grow
infinitely



were not responsive or flexible enough



depended upon dedicated (life
-
long) workers



do not provide companies with competitive edge


One of the results of these manual systems was that each functional area "owned" their data and
became very posses
sive of it. This was OK for then. When computers were introduced it gave
a way to automate manual tasks. (In fact, the title "computer" used to apply to a person that did
manual computations.) Allows company to:



speed up tasks (without infinite person
nel)



get better task integrity (accuracy)



grow companies bigger


This was a big deal. Much fan
-
fare. People expected lots. The big companies were the first to
buy computers. The early computers were very expensive and not very user
-
friendly. They
cost

so much that they were applied to the areas where the chance of success was best (i.e., pay
-
back of the huge investment). The best area to get quick pay
-
back (and get lots of good PR)
was to automate independent application systems to automate jobs of cl
erical workers and solve
a specific problem. For example, payroll, accounts payable, purchasing,... This gave early
computer systems a bad reputation for putting people out of work (they still have that reputation
to some people). The early system design
ers tried to program the computer exactly the same as
the manual system. This was easier, but tended to prolong the fiction of functional data
ownership.


The system and its data were owned by the department responsible for the function. The data
was con
sidered a private resource. Computer technology advances and systems grow more
powerful, but the functional users still consider data to be a private resource. (A fiefdom).
BAD!!! Also, each time a new application was developed, new files were defined.

It took
years for people to realize (and really start to do something about) treating data as a corporate
resource that needs to be managed. This was one of the factors that started the database
movement.


Imagine situation where money is not managed in
company. Each department has a pile they
control. No sharing, no cross
-
department information sharing about the money. If big purchase
came up, no way to consolidate money to make it. It is ridiculous to consider, but that is
exactly the situation of t
he traditional approach.

Copyright

Jay Lightfoot, Ph.D.

3


Let's take a closer look at the traditional file approach.....


B. Traditional File Processing Approach


In the traditional approach, computer systems are designed for and used by individual
departments for specific tasks. Each
department maintains and "owns" their own data. Thus,
little sharing takes place between departments (only that sharing that is mandated and required).
Each new system usually builds its own data files rather than try to access existing ones because
to t
ough to coordinate modifications and requirements. A central DP shop provides the
programming service for the functional departments (but it is still the departments data). The
DP shop must somehow manage all programming needs for the whole company witho
ut the
benefit of a global plan or centralized authority. Very tough to do! Large backlogs
characterized this setup. Impatient departments sometimes built their own mini
-
DP shop to
handle their own programming needs, but that seldom worked because exist
ing data sharing
goes away.



1. Advantages


There are 2 transitive advantages to the traditional approach (but they go away quickly). Be
aware that I am not defending these methods. I just want you to realize that when these systems
initially
developed, there were valid reasons (advantages).


1st advantage...



a) Simple (natural)


The traditional file processing method was the easiest way for systems to develop given the
current state
-
of
-
the
-
art. Those people in the 50's weren't stupid. Th
ey realized that the situation
was not ideal, it's just that the computer technology was limited (e.g., tapes and cards were
available, not disks). (Later, it stayed popular because of momentum and the expense to
change.) Also, the systems that were auto
mated were not the real money
-
makers for the
company. Marketing and production make money. Payroll and General Ledger just keep track
of it. Lots easier ways to improve company performance elsewhere than by improving
computerized accounting systems (e.g
., work on mission critical manual systems). So, for the
time period, the traditional systems were the natural way to start. Not so today!


2nd advantage of traditional systems...


Copyright

Jay Lightfoot, Ph.D.

4


b) Cheap (initially)


Building integrated systems is much more comple
x than building stand
-
alone (traditional)
systems. Thus, it costs more to build integrated systems than to build stand
-
alone ones.
Computers were normally bought to save money, so it did not make sense to spend your "profit"
on unnecessary integration.
Integration was viewed as unnecessary because they had gotten
along fine without it in the manual systems (or so they thought). Once a computerized manual
system was working, it was cheaper to leave it alone than to severely modify it. (They did not
know

yet that 80% of all programming is maintenance.)



2. Disadvantages of traditional systems


The disadvantages of the traditional file approach far outweigh the "so called advantages". This
is why database was developed.


1st disadvantage....



a) D
istributed Ownership


In traditional systems each functional department "owned" its own data. This is the "my data"
syndrome. They feel it is their personal property (like office furniture) to do with as they
please. Often they are hesitant to let anyon
e else in the company see it (because then their
power would be diminished) Merging data became a political process. Does not help global
competitiveness.


3 negative aspects to distributed ownership.....



(1) Uncontrolled Redundancy


Each new system required new files to be built (since no one would share existing data even if
technology existed to make it possible). Lots of data was stored multiple times. Once for each
system that needed access to it. This was bad because:



disk spa
ce is wasted



data must be input several times



data must be updated several times (multiple occurrences)



data was inconsistent (next section)


Copyright

Jay Lightfoot, Ph.D.

5

Everyone realized that redundancy was a problem, but not one knew how to fix it short of
giving up control of "the
ir data".


2nd aspect.....



(2) Inconsistent Data


This is the worst of the problems
--

it requires special attention. When you store data several
times (several places) you WILL have inconsistencies not matter how careful you are. The
worst case is w
here you miss an update. Thus, you have two (or more) versions of data... which
is correct? A more subtle inconsistency is when the update is slightly incorrect (misspelling of
name or different abbreviation) or the timing of the update is incorrect (rai
se after P/R period...).
Since each copy of data is "owned" by a different group, it is really no ones responsibility to
make sure that it is globally correct. Responsibility falls between the cracks. This allows
multiple versions of the truth to be sto
red in the computer. This undermines the users
confidence in a database if it appears to give incorrect answers. It can also can mess up
customer relations if you send goods to the wrong address or bill for an incorrect invoice.


3rd negative aspect of
distributed ownership....



(3) Islands of Information


A problem that redundancy and the "my data" syndrome causes is that groups develop their own
data environments that are independently maintained. These users become self
-
sufficient and
cut off from o
thers. They are not willing to share or receive information. Not only redundant
data, but redundant tools and effort too.


2nd disadvantage of traditional systems...


b) Limited Data Sharing


Since everyone has their own copy of the data, no one shares
. This wastes disk space and
perpetuates the cycle of independent file systems. For example, you are designing a new
system that could use data already defined and maintained by other departments. Since they are
unwilling to share (and it would require
some rewrite of their (working) system) you decide to
take the easy way out and just re
-
enter the data. This is politically easier and simpler for the
moment, but is very short sighted. It delays fixing the problem.


3rd disadvantage of traditional
systems...


Copyright

Jay Lightfoot, Ph.D.

6


c) Inflexibility


Independent islands of information do not easily cooperate. When changes are needed it
becomes a political nightmare. People have to call in favors and make deals to get things done.
If quick action and responsiveness are
required
--

forget it. This is where traditional file
systems really start to hurt companies. It causes loss of business to more efficient competition.
In addition, changes to file structures cause a ripple effect in all the programs that access that
dat
a. For example 9
-
digit zip codes were a big headache. Independent file
-
based systems tend
to be designed for a specific purpose. They do not easily allow un
-
planned processing.


The 1st negative aspect of inflexibility is...



(1) No Data Independence


This is very important. When data definition or processing changes in a traditional file system
it requires many modifications. The reason for this is similar to the data redundancy discussion
above. In this case, the data definition is redundantly stor
ed. In addition, the way the data are
processed is also stored redundantly. So, you have to make changes in many places to
accommodate a "simple" data change. When you can make data modifications without
changing programs you have "data independence".


2nd aspect of inflexibility...






(2) Ad Hoc Queries Constrained


Another result of inflexibility is that it is difficult to get information in any format other than the
standard format. For instance, say a manager is asked (of the top of her head) "how

many red
widgets were sold on the east cost to people under 12 years old?". Chances are that a special
report does not exist with this information. (Called "ad hoc" query
--

one time only query). To
get the information a special report must be generate
d. Programmer time is usually backlogged
for months so it is easier to pull the information by hand from old reports. This defeats the
purpose of the computer.


The data is out there and would be fairly easy to collect if proper tools existed. Recently,

tools
have been built to help fix this problem (called 4th generation languages). These make it easier
Copyright

Jay Lightfoot, Ph.D.

7

for non
-
programmers to do ad hoc queries, but still requires in
-
depth knowledge about where
and how the data are stored. Most managers don't know (or
care to know) this.


4th disadvantage of traditional system...



d) Poor Standard Enforcement


Another result of distributed ownership. Standards are things like:



size of data fields (5 or 9 digit zip codes?)



abbreviations used (AV. or AVE. for avenue?)



timing of changes (when do you close the books?)



who can access data and when?



how do you name data fields?



how do you calculate accounting discounts?



what operating system do we use (UNIX or DOS?)



what file format is best (ASCII or EBCDIC?)



etc…


Standar
ds are one of the main things that make a group of independent file systems into a
corporate database. Traditional file systems have very poor (if any) standard enforcement
primarily because no one is in charge of the data. No rules exist, so you get a f
ree
-
for
-
all.


3 aspects to poor standards enforcement...





(1) Processing


Processing standards tell groups how and when to transform data into information. It can
include the obvious "how do you calculate accounting discounts" but can also include more

global questions like:



what hardware is used



what software is used



what network protocol is used



etc.....


Traditional file systems develop in the absence of any centralized control, hence there are few
processing standards enforced unless people informal
ly get together or unless there is a strong
push from management towards commonality.

Copyright

Jay Lightfoot, Ph.D.

8


2nd aspect of poor standards enforcement...



(2) Storage


The way data are stored also requires standards. The obvious storage standards include:



how long are data
fields



what type are the data fields



what are your abbreviations


Some more subtle standards include:



what type of disk drive do you use



what encoding scheme is used



is data compression or encryption used


Traditional file based systems do not generally ha
ve any storage standards because no central
control exerted to make them conform.


3rd aspect of standards enforcement...



(3) Naming


Finally, naming schemes are a type of standard not often found in

traditional file based systems. Because each group builds their files in isolation, they way data
are names is inconsistent. This means that even if users wanted to share data they would have
to figure out what it is called first. Last disadvantage of t
raditional systems...



e) Excessive Costs


Traditional file based systems cost more in the long run. They may seem cheaper when you are
developing them, but the combined effect of all the previously stated problems eventually adds
up. I group the cost

problems into 2 general categories...



(1) Low Programmer Productivity


All the complexity of determining file names and working out standards, and dealing with data
independence problems results in LOW PROGRAMMER PRODUCTIVITY. The complexity
Copyright

Jay Lightfoot, Ph.D.

9

of trying
to share data slows down programmers. Also, if they have to "reinvent the wheel"
every time they do something it results in wasted effort. Finally, programmer productivity is
lower because much time is spent in maintenance (next topic).



(2) Excessive P
rogram Maintenance


Lack of data independence and the inflexibility of traditional systems means that LOTS of
effort is spent just to keep the systems working. Statistics show that 70% to 90% of all
programmer effort is spent maintaining existing programs

--

not building new systems. This
does not advance the state of the company, just keeps them even. Everyone suffers from the
high cost of maintenance. Programmers hate it (lousy job), managers dislike it because it is not
productive, users dislike it b
ecause of the backlogs. BAD!! This is a prime factor in low
programmer productivity. Can be directly tied to lack of data sharing and centralized control.



C. Database Approach


1. Life Cycle Connection


Nolan has divided the life cycle of organiza
tions information systems into 6 phases.


STAGE 1:
INITIATION

-

DP first introduced


STAGE 2:
CONTAGION

-

Spreads rapidly automating basic functions


STAGE 3:
CONTROL

-

Costs go up so more control needed


STAGE 4:
INTEGRATION

-

Pul
l together independent systems (try)

--
> STAGE 5:
ARCHITECTURE

-

Develop enterprise
-
wide data model (plan)


STAGE 6:
DISTRIBUTION

-

Distribute data and processing


The traditional approach occupies stages 1 through 4. As the organizational informat
ion system
goals advanced, more money was spent. Traditional DP processes were concerned with
automating existing systems. About stage 4 (integration) companies started experimenting with
database approach. This approach combines (integrates) the island
s of information into a useful
centralized data repository. Note that it initially saved money (disk space, documentation,
specification...) but gradually grew more expensive as you require more from your information
system.


The point is that database ap
proach is a natural outgrowth to file processing system. An
evolution. As managers expect more from their information systems, the database approach
makes more sense. As database approach comes on
-
line, things become possible that could not
be done with
out database. This makes going back to the old ways impossible (if you want to
stay competitive).

Copyright

Jay Lightfoot, Ph.D.

10


Let's look at the characteristics of the database approach in detail...



2. Database Approach Characteristics


As hinted above, the database approach
does away with the "my data" syndrome. It collects all
data into an organized repository. The approach brings a new way of thinking and some new
tools that permit the control of redundancy, improved flexibility, and centralized data control.


3 features
characterize the database approach:


1.
Shareability of data


2.
Centralized data management and control


3.
Adaptability


Here's a summary of the approach...


The database approach begins by building a model of the organization (called an ente
rprise
model). The model summarizes the structure of the data used (and needed) in the company. It
shows the data entities and the relationships between them. (In real life, you seldom have a
global enterprise model, rather you have a collection of smal
ler system models. Their collected
whole is equivalent to the enterprise model idea.) The structure is carefully built so that all
users needs are taken care of with a minimum of redundancy. (Thus, sharing and redundancy
are both solved.) Next, you des
cribe this structure to a piece of software (called a DBMS) that
can maintain both it and data in its format. Finally you load the data and use the DBMS to
access and maintain the it and its structure. It is the role of the DBMS to retrieve and store the

data. Also the DBMS does its best to make sure the data are accurate (not necessarily correct).
It is all very complex, more so that this would indicate.


There are certain advantages to the database approach...


3. Advantages


Basically, the adva
ntages of the database approach fix the disadvantages of the traditional
approach (on purpose). There should be no surprises here.


The 1st advantage ...


Copyright

Jay Lightfoot, Ph.D.

11



a) Data Integration (conceptual model)


For the first time, companies may see exactly what data

they have and need. This sounds
simple, but it is surprising how many companies don't know what data they have and require.
Knowing this can lead to some efficiencies (e.g., why are we storing this information that no
one wants?). Allows streamlining o
f requirements and systems if effort is applied.


The 2nd advantage of database approach...



b) Controlled Redundancy


Very obvious. If you only store it once then you do away with redundancy and its related
problems. Example: If you only store an
employee address in one place then a single update is
effective for all users. You cannot get rid of ALL redundancy. There are times when it is a
good idea to keep redundant data (We'll talk about that later in the semester), but you can
minimize it. Th
is has a side benefit of saving disk space and requiring fewer maintenance
programs.


3rd advantage of database approach...



c) Data Sharing


Since it is only stored in one place, you must share it. Actually, sharing is a very important
advantage on
its own. It means that you truly are managing your data and treating it like a
company asset. However,

when data are shared, it brings on some new problems not present in
traditional systems. Like concurrency control. We'll talk about that later.


4th
advantage...



d) Data Integrity


Your book calls this "data consistency". There is more to it than that. It refers to the problem
of inconsistent data in file based systems. If data redundancy is reduced then data is only stored
in one place, so the m
ultiple inconsistent version problem is fixed.


Integrity

is the assurance that the data are:



accurate



complete

Copyright

Jay Lightfoot, Ph.D.

12



timely



reliable


It also deals with the notion that the data accurately represents the real world system it attempts
to model. In other words,

integrity is concerned with obvious things like:



names spelled correctly



proper account number



all social security numbers are unique


And also concerned with less obvious things like:



customer discount applied to proper customer



employee does not make
more money than their boss


Data integrity is a very important part of a DBMS (and a part that most do not do very well).


5th advantage database approach...


e) Standard Enforcement


Standards enforcement is an important (and often ignored) part of the
database approach. Since
you have centralized control of data it is easier to enforce standards. For example:



naming standards (e.g., first 3 characters is system name)



field definition standards (MM/DD/YY for dates)



usage standards (no payroll data acce
ss on weekends)



processing standards (common encryption algorithm)



etc...


The DBA is the person(s) responsible for standard enforcement. If they do a through job then
the database if much more usable.


6th advantage...



f) Centralized Control


Central
ized control is sort of a side benefit of the database approach (not the direct reason for
using the approach, but a side effect.) Most of the benefits of centralized control are themselves
advantages of the database approach. Centralized control provide
s some benefits that you
might not immediately think of. For instance, 1st benefit of centralized control...


Copyright

Jay Lightfoot, Ph.D.

13


(1) Security/Privacy


You can better control data security and privacy if you have centralized control.
Security
is the
assurance that data is
kept from unauthorized users.
Privacy

is the assurance that data is not
misused by those users. (Similar concepts, but they are not identical) Having all the data in one
spot within the conceptual model makes this easier.


2nd benefit of centralized
control...



(2) Integrity


Integrity

= accurate, complete, timely, reliable


As mentioned before, integrity is much easier if you have a centralized body in charge of the
data and the standards. Traditional file systems developed integrity through consis
tent "good
programming practices" and voluntary cooperation among departments. This, in part, explains
why some companies were more successful at system integration than others using similar
equipment and doing similar processing (i.e., the informal netwo
rk was better).


3rd benefit of centralized control...



(3) Backup/Recovery


Backup and recovery are often not done properly in file based systems because they are not
thought of as necessary. (Amateur systems managers don't realize the importance of ba
ckups.)
Ironically, backups are about the most important thing that you can do. When departments
handled their own data, traditional based backup and recovery was done in a spotty and
inconsistent manner. Some people were good at it and others were not.

The reason that more
problems did not occur was because companies usually only had 1 computer center (for all their
independent file
-
based systems). The computer center handled backup and recovery without
the department's knowledge. Backup and recovery

are much easier when database methods
used because the DBMS can handle the details (consistently).


4th benefit of centralized control...



(4) Standards


Obviously, standards are easier when they are centrally dictated. You cannot have standards
without

some central body (formal or informal) coming up with them.

Copyright

Jay Lightfoot, Ph.D.

14


7th advantage of the database approach...



g) Flexibility/Responsiveness


The database approach is much more flexible because you can access data without concern with
where or how it is store
d. This idea is very close to the idea of
data independence
. Some books
also mention that database systems provide multiple retrieval paths to each data item. This is
true (but misses the prime point about data independence). By default, a flexible sys
tem is also
responsive. That is, you can make changes more quickly because there is less to change. (Less
detail to worry about). Specifically, the flexibility of the database approach helps with...



(1) Ad Hoc Queries


Query languages have been
developed that non
-
programmers can use to make one
-
time
-
only
queries (ad hoc). In this way they can satisfy their unique information needs without using
traditional programming services (i.e., avoid the backlog). GOOD!! However, it still does not
fix th
e problem where users must have in
-
depth knowledge of data names and where it is
stored. 4th generation languages (which is what most DBMS query tools are) help, but not
quite as good as the sales literature promises.

2nd benefit from better flexibility..
.



(2) Multiple User Views


A side benefit of the database approach is called "views". A view is a single users perspective
of the database. Briefly, users do not need to see the whole database. They typically only need
to see a small portion (the same

portion that the traditional file would have given). You define
this database subset with the DBMS mechanism called "view definition". (We'll do this later.)
Views enhance flexibility because they have great data independence. You can modify the
under
lying data without disturbing the view and change the view definition without modifying
the programs that use it (with one exception
--

you can’t remove data that the program directly
uses).


8th advantage of the database approach...




Copyright

Jay Lightfoot, Ph.D.

15


h) Data Independ
ence


Traditional programming builds the data definition and the data processing into the same unit
(called a program). This means that whenever the data definition changes, you have modify
each program that uses it. BAD!!! The database approach separate
s the data definition and the
data access methods from the programs that process the data. This provides data independence
(that is the ability to change the data structure without modifying programs that use the data).


9th advantage of the database appr
oach...



i) Cost Savings


The database approach saves money in the long run. However, as we will see later, a good
DBMS is expensive, so this benefit is weak and a little harder to prove. Database cost savings
come from 2 general areas...



(1) Easier

Application Development


Much of the work in traditional application development is defining the data and writing code
to access it. Also, a good deal of traditional application programs were devoted to integrity
checks. The database approach minimizes
this because the data are already defined and the
access is up to the DBMS. Also, the integrity constraints can be built into the DBMS. This
speeds up application development. However, it slows down analysis and building time
because you have to do all
that work up front. (No free lunches). Code re
-
use is another
advantage of the database approach. In some systems, you can build generic routines and have
the DBMS store them for later re
-
use. No need to reinvent the wheel and test these routines
once
you know they work. This idea was common in traditional file
-
based systems also, but is
easier in database approach.


2nd area where you get cost savings from DBMS approach...



(2) Reduced Maintenance


Since the data, the data definition, and the
application programs in a DBMS are all independent,
you can change any without affecting the others (
data independence
). This makes program
maintenance (and testing) easier and quicker (less money). Old
-
style systems forced the
programmer to be concerned

with "side effects" (unplanned effects of a program change). It
required extensive planning and testing for even simple modifications (big $). The database
approach does not totally do away with the problem of maintenance side effects, but at least it
Copyright

Jay Lightfoot, Ph.D.

16

m
inimizes them. The end effect is better programmer productivity. All is not rosy for database.
There are some problems....



4. Disadvantages


This semester we will not dwell on the problems of the database approach. The advantages of
database gr
eatly outweigh the problems. However, I want you to have a realistic view of
DBMS from this course (we are not "selling" DBMSs here.)



a) Specialized Personnel and Hardware (big $)


Although the database approach is cheaper in the long run due to impro
ved programmer
productivity (and other reasons), it has some up
-
front costs that are not trivial. These costs are
often overlooked by anxious users. You don't want to be surprised. For example...



(1) DBA, DA, Special Programmers


If you plan on using
existing people, you will have to send them to training classes on your
DBMS product. These classes cost between $600 to $900 a day (plus expenses). Classes can
last weeks. You can also do on
-
site training, but that is also very expensive. If you want
to hire
people already trained, they cost more than normal programmers (because they have had the
training classes or experience). Either way, you cannot avoid the training cost and subsequent
loss of productivity while they learn. In addition, you will
have to hire (or train) a database
administrator (DBA). This is a very specialized person (or group of people) and they are not
cheap. ($50,000 to $150,000 per year normally.)


2nd area of increased costs...




(2) Bigger Disk, CPU, Memory


As far as har
dware is concerned, you will probably have to upgrade your CPU, buy more fast
RAM, and buy more disk drives. This costs big $. This is in addition to the cost of the DBMS
itself. (They typically cost between $15,000 and $100,000 (or more).


3rd reason

for increased cost...


Copyright

Jay Lightfoot, Ph.D.

17


(3) Potentially Slower Processing


Once they spend all this money, many are surprised that the overall system throughput is slower
than with their old file
-
based system (even with the beefed up CPU). The reason for this is that
th
e machine is having to do a lot more things (e.g., concurrency control, security, integrity,
recovery, ...). You want these things, it is just that people are surprised that they slow down the
machine. The real gain in DBMS comes in



programming cost savings



better data consistency



data sharing synergy.


2nd disadvantage of the database approach...



b) Data Concurrency Problems


Along with the advantages of data sharing come several problems. These problems are unique
to systems
that allow more than one functional area to use the same data at the same time.


1st data sharing problem...



(1) Deadlock

/

Livelock


When data are shared, the DBMS must be careful to only allow one user at a time to change it.
If multiple users could c
hange at the same time then it would be chaos. The DBMS does this by
giving sole access to update data to one user at a time. Anybody else that needs it must wait
until the first is through. Deadlock is a condition where 2 (or more) users are each waiti
ng for
the other to release the same piece of data. Neither can continue until the other gives up the
needed data. Deadlock (a.k.a. deadly embrace) will continue until you turn the computer off or
until one of the users "gives in". For example, imagine
a 4
-
way stop sign with two cars waiting
for each other. Traditionally the DBMS looks for deadlock and forces one user to release their
data and start over.


Livelock is a condition where a user cannot proceed for a long time because they are never
given a
ccess to needed data. This is different from deadlock in that they do not have anything
that anyone else wants. They are just "low on the priority list" so they never get access to a
popular piece of data. Again, the DBMS has ways to fix this problem.
Both problems are
directly related to data sharing (and use lots of CPU cycles).


The 2nd problem directly related to data sharing...(concurrency)

Copyright

Jay Lightfoot, Ph.D.

18



(2) Transaction Rollback


One way the DBMS fixes deadlock is through "transaction rollback". Basically, a
rollback is
the process of "undoing" everything that a user has done. (A "transaction" is an arbitrary unit of
work on the database.) When the DBMS selects a "victim", they UNDO everything that user
has done (including releasing locks on data items). On
ce this is done then other users can
continue. Rollbacks use lots of CPU cycles also and are normally result of data contention (via
data sharing).


3rd problem of the database approach...



c) Data Security Problems


One of the few areas where the
traditional approach was better. When you combine all the
traditional file
-
based systems you also combine all the security concerns of that data. Strangely
enough, the traditional file based approach had fairly good security because people guarded
their
data very carefully. Some of the security problems are not new (e.g., backup/recovery,
authorization). They are just bigger. Other problems spring up specifically because a DBMS is
present. For example, the potential for catastrophic failure. All in a
ll, DBMS security is not
new, just more intense.


1st aspect of DBMS security is failure potential...



(1) Catastrophic Failure Possible


When you put all your eggs in one basket, you risk losing them. Translation: Lack of
redundancy can easily lead to
losing everything if you are not careful. Not only are the data not
redundantly stored, but the processing is also performed on single set of hardware. Dangerous!
When you depend on one system (DBMS) to perform all data retrieval, storage,
backup/recove
ry, integrity, ... then you are very vulnerable to a failure of that system. On a
less catastrophic note, a single bad application can sometimes bring down the whole DBMS.
This sort of thing rarely happened on old file
-
based systems.


2nd aspect of secu
rity problems present in DBMS...




Copyright

Jay Lightfoot, Ph.D.

19


(2) Elaborate Backup/Recovery


This generally comes under the heading of security. You can provide backup tapes, backup
disks, backup CPUs, ... all the way up to totally redundant sites. The more security you buy, the

more it costs. This is not really that different from file
-
based systems since most of the big ones
ran at a single hardware site. The problem is that all users are down when you do backups, not
just a few functions. (Note, some fancy DBMSs, like ORAC
LE, do incremental backups so
that users are not overly inconvenienced.)


4th disadvantage of the database approach...



d) Organizational Conflict


Shared database requires a consensus on data definition and ownership. This is where the DBA
really earn
s their money. It takes a good politician to navigate this minefield. This is a political
concern. Users from separate functional areas must agree on who owns the data, who can
maintain it, when, how it is processed, distributed, .normally textbooks min
imize the political
aspects of data processing in the organization. You should remember that politics always
follows large budgets and power. This is true in the DP area.


The 5th database disadvantage...



e) Complex Conceptual Design


It is more compl
ex to understand (and program) when you must be concerned with the entire
enterprise model. DBMS programmers must be concerned with transaction interaction, timing,
and a host of problems not present in old file
-
based systems. To solve this, few companie
s
actually have a full enterprise model. Most have subsets of the model that link logically related
sub
-
systems. This is a practical way to approach extreme complexity. It is not necessarily
good, it is just the way things are until we have better tools

to help the programmer deal with
extreme detail. (e.g., CASE tools, SEEs, ...)


5th database disadvantage...



f) Extra Cost of DBMS on Program Development


Having a DBMS in place can slow down certain aspects of the system life
-
cycle. If you ever
evaluate a DBMS, you should realize this.


Copyright

Jay Lightfoot, Ph.D.

20

1st of all you have testing difficulties...



(1) Testing Difficulties


You usually cannot test applications directly ag
ainst the live database because it may damage it
(again, all eggs in one basket...). This means that you must either make an extract of the full
database or develop a sub
-
set of the database strictly for testing purposes. This takes time and
disk space.

It also may not adequately test the problems that often occur under "load" (multiple
users).


2nd extra cost of program development...



(2) Complex Programming Environment


This is related to the "complex conceptual design" and "special programmers" topi
cs above.
Basically, DBMS products are more complex to use than file
-
based systems. It takes time to
learn them. Also, a full enterprise model of a company is usually too complex for a single
programmer to comprehend without aids (CASE tools, ...). Tak
en together, you lose some of
your programmer productivity (short run) due to the complexity of the DBMS. All in all
however, the advantages FAR OUTWEIGH the disadvantages. So the database approach is the
way of the future. NEXT let’s examine the compone
nt parts of a DBMS....


IV. COMPONENTS OF A DATABASE SYSTEM


A DBMS is not one big program. It is made up of several large complex sub
-
systems and a
several groups of people (classes of users). To begin, the basic functions of a DBMS are:



data storage,
retrieval, and update



a user
-
accessible catalog



transaction support



concurrency control services



recovery services



authorization services



support for data communications



integrity services



services to promote data independence



utility services (reorganize
statistics, monitoring...)


These diverse functions are handled by the collection of sub
-
systems discussed above.

Copyright

Jay Lightfoot, Ph.D.

21


1st component of DBMS


A. Data (raw)


The main purpose of a DBMS is to store and retrieve data. Obviously this is the most important
compone
nt. There are 2 general categories of data:

1.

actual
(real)

2.

virtual

(derived)


The actual data is stored on disk and can be read and written just as in traditional systems
(although you don't worry about storage details) Virtual data is not actually stored

anywhere.
Rather, the derivation method is stored. When you require the information it is created from

existing data. Example: Employee age would not really be stored in the database. Rather their
birthday and the current day would be used to calcula
te it. In that way it would always be
current. Another example is if you want a sum of a purchase order. The sum would not
actually be on disk, rather it would be calculated "on the fly".


2nd component of DBMS...


B. Structure (Repository, Schema)


Raw

data is not really very useful by itself. You cannot have any relationships among data
items in raw data (just a bunch of data items). What is missing is the structure that ties the
pieces together. We will be working with this all semester. It is a v
ery powerful aspect of the
DBMS approach. It is also one of the main complexities that must be dealt with. There are
several basic concepts associated with structure:



structure within "record"



structure between "records"



cardinality (1:1, 1:N, M:N)
-

nature of relationship



degree (how many entities participate)



generalization/specialization



primary key uniqueness



referential integrity



user defined integrity rules


A generic term for information about the structure of data is
"metadata".
In database
te
rminology, the part of the DBMS that defines the structure is called the
"schema".

The
Copyright

Jay Lightfoot, Ph.D.

22

schema is stored in the DBMS component called the
"catalog"

or the repository. Again, we
will get to most of this later, this is just an introduction. 3rd DBMS compone
nt ...


C. DBMS


The Data Base Management System (DBMS) is the generic name for the collection of sub
-
systems used to create, maintain, and provide controlled access to data. They range in
complexity from small PC
-
DBMS systems (Access, dBase IV, Rbase...)

costing a few hundred
dollars to large mainframe products (ORACLE, IBM DB2) costing several hundred thousand
dollars.


1. DBMS Engine


The central component of the DBMS. A module that provides access to the physical structure
and the data. Also coo
rdinates all of the other functions done by the DBMS. The central control
module that receives requests from users , determines where the data is and how to get it, and
issues physical I/O requests to the computer operating system. Also provides some misc
.
services such as memory and buffer management, index maintenance, and disk management.


2nd sub
-
component...



2. Interface Subsystem


Provides facilities for users and applications to access the various components of the DBMS.
Most DBMS products p
rovide a variety of languages and interfaces to satisfy the different types
of users and the different sub
-
systems that must be accessed. The following are common
interfaces that are provided (some are missing in the smaller/cheaper products). Note that
some
DBMS combine the functions of several interfaces into a single sub
-
system (e.g., SQL is DDL,
DML, and DCL combined).


1st interface component...(of 6)



a) Definition Language (DDL) (structure)


Used to define and maintain the database structures
(e.g., records, tables, files, views, indexing,
...) Specifically DDL defines:



all data items (type, specification...)



all record types (tables in relational model)



the relationships among record types (not in relational model)

Copyright

Jay Lightfoot, Ph.D.

23



user views (or subschemas)


The DDL is used to define the conceptual database and turn it into a physical database


2nd interface component...



b) Query Language (DML)


Used to manipulate and query the data itself. Typically used by a host program or as ad hoc
commands in int
eractive mode. For example, you could select a subset of data based upon
some query criteria using the DML. In some database systems the DML also provides the
commands to "navigate" the data structure.


3rd interface component...



c) Control Language
(DCL)


Used to grant and revoke access rights to individuals (and groups). A "right" is the privilege to
perform a data manipulation operation. For example, the DBA can grant a clerk the right to
access and delete INVENTORY records (but not to update the
m). Another related concept is a
database “role”. A role is a predefined set of access rights and privileges that can be assigned
to a user. When the definition of the role changes, all users assigned to that role get the updated
access rights. Again,

not all DBMSs call the DDL, DML, and DCL separate interfaces.
However, all 3 functions must be present in the DBMS.


4th interface component...



d) Graphic Interface (QBE)


Optional. Some modern DBMSs provide a graphical representation of the data st
ructure (a
table usually) that allows you to select which data items to query on and the conditions for
selection. Normally this feature is found in Query By Example sub
-
systems of relational
DBMSs. The graphic interface makes it easier for non
-
technical
users to make complex
queries. Also handy because it is a common interface that can cross diverse DBMS systems.


5th interface component...


Copyright

Jay Lightfoot, Ph.D.

24


e) Forms Interface


Optional. A screen
-
oriented form is presented to the user. They respond by filling in the

blanks. The result is that the DBMS uses the form that you design to input and output data.


6th interface component...



f) High
-
Level Programming Language (interface)


Programmers need to be able to access data via a high
-
level language. This could
be old
-
style
(3rd generation) languages like COBOL, FORTRAN, Pascal. Or it could be newer 4th
generation languages like Toolbox, Mapper,

Easytrieve, ... . Most big mainframe products
(Ingress, Oracle, Informix...) include a 4GL as part of the DBMS. Stud
ies have shown that
application done using 4GLs result in a system that is up 10 to 20 times faster than using
traditional 3rd generation languages. (Note, code does not run faster, it is just debugged
sooner.) The interface usually is achieved by adding

a few extra commands (verbs) to the
standard language and having a pre
-
processor translate these verbs into DBMS calls (using the
CALL format of the specific operating system). This method works well because the user does
not need to know the complexitie
s of operating system calls and the resulting code is somewhat
portable. The interfaces to "old" languages is needed because there is a lot of code and
programmer experience out there that cannot be ignored.


3rd sub
-
component DBMS...



3. Repository
Dictionary Subsystem


The word "repository" is used to relate back to the Information Resource Management concept
mentioned earlier. The data in the database should be treated as a corporate resource. This
resource must be managed. The repository is mor
e than a "data dictionary" or "catalog". It is
the central place that you store:



system documentation



data structure



project life cycle information



conceptual model information



etc…


CASE tools use it extensively. A DBMS system component must be prese
nt to manage and
control access to the repository. To a certain extent, the DCL does part and CASE tools do part.
Copyright

Jay Lightfoot, Ph.D.

25

It provides facilities for recording, storing, and processing descriptions of an organization's data
and data processing resources. Pretty

new idea (still being defined by industry).


4th sub
-
component...



4. Data Integrity Subsystem


Provides facilities for managing the integrity in the database and the integrity of metadata in the
repository. There are 4 important functions:



intrarecord integrity

-

enforce constraints on data item values and types within each
record in the database



referential integrity

-

enforce the validity of references between records in the
database



user
-
define integrity

-

Business rules (arbitrary) that
must be upheld (e.g., employee
can't make more than boss).



concurrency control

-

Assuring the validity of data when multiple users access
simultaneously (get into more later).


This is a traditional weak spot on most commercial DBMSs.


5th sub
-
component...



5. Security Management Subsystem


A subsystem that provides facilities to protect and control access to the database. The
2 most
important aspects

of security are:



securing data from unauthorized access



protect it against disasters


The first is do
ne through passwords, views, and protection levels. Encryption is also widely
used. The second aspect uses backups, logs, before and after images, disaster recovery plans,
etc.


6th sub
-
component...



6. Backup/Recovery Subsystem


Copyright

Jay Lightfoot, Ph.D.

26

A subsystem that
logs transactions and database changes and periodically makes backup copies
of the database. This is done so you don't lose data in the event of a problem. There are
different levels of problems that backup/recovery prepares for. They range from redoing

a
transaction that was rolled back due to a concurrency conflict (minor) to totally restoring the
database after the computer center is destroyed (major).


7th sub
-
component...



7. Application Development Subsystem


Optional. Provides facilities so
that end users and/or programmers can develop complete
database applications. Some use elaborate CASE tools as well as screen and report generators
to create full applications with minimal work. Others help write code from sketchy
specifications. In any

event, this is an aid to non
-
technical users and to beef up programmer
productivity.


8th sub
-
component...



8. Performance Management Subsystem


The DBA needs some way to determine if the DBMS is performing well. These tools (often
called "monitorin
g utilities") give the DBA information needed to tune DBMS performance.
Example: A monitor utility can find data items that are accessed frequently enough to need an
index. They can also be used to determine if a data item needs to be on a faster disk dr
ive or
possibly replicated.


4th DMBS component...


D. CASE Tools


Computer
-
Aided Software Engineering (CASE) tools are used for automating the software
development and maintenance tasks. Basically, you use a computer program to automate the
software life
-
cycle work. This is fairly new and not widely used in industry, but you will have
some contact during your career. Depending upon the CASE tool, it can automate:



feasibility analysis



requirements definition



logical design



prototyping

Copyright

Jay Lightfoot, Ph.D.

27



programming and tes
ting



implementation



maintenance


In future, most large software projects will use CASE tools (probably hooked into the data
repository subsystem). Some of the advantages of using CASE tools are:



improved productivity in development



improved quality
through automated checking



automatic preparation and update of documentation



encouragement of prototyping and incremental development



automatic preparation of program code from requirements definition



reduced maintenance efforts


5th DBMS component...


E.
Application Programs


These are the specific procedures (programs) that manipulate the data in the database. They are
written for specific applications needed by the business. You don't buy these as part of the
database. Some are fairly predictable (pay
roll, accounts payable, general ledger) while others
could be quite arbitrary and specialized. the main point is that they are specifically written for
the business using business defined requirements. The best thing that the DBMS can do to help
is to pr
ovide tools to help write and maintain the programs.


6th DBMS component (ROLE)...

F. Data Administrators


The next 3 "components" could better be called "database roles". They are people that perform
certain tasks. This includes the DBA and the DA. The

two positions are responsible for the
technical and political management of the DBMS and its use. The DBA is the person
responsible for managing the organization's data. DBA has the ultimate authority and
responsibility for ensuring that the data is sec
ure, accurate, and timely. Specifically, the DBA
does:



build and maintain the data dictionary



resolve disputes about data usage, control...



design and maintain the conceptual and physical database



monitor database for performance tuning



perform backups an
d recoveries



etc...

Copyright

Jay Lightfoot, Ph.D.

28


Sometimes the DBA position is held by a group of people and sometimes the technical side is
handled by a Data Administrator (DA) while the political side is handled by the DBA.


2nd "role"...


G. Systems Developers


These are programmers that write the application code to meet specification. They are typically
well versed in the use of the DBMS tools and in the function that they are programming.
Seldom are system developers the same people that use the system. Thi
s partly due to the
specialized nature of the work and partly due to a need to provide security by separating the
users and the designers of the system.


3rd "role"...


H. End Users


Finally, the end users are those that have a need to access the data. G
enerally, end users are
non
-
technical and do not use most of the DBMS components (at least they don't know they are
using them). To the end user, the DBMS should look no worse than the file
-
based system that it
replaced (or you haven't done you job correc
tly). Hopefully it looks better.