Body.docx - gloriamjoy

southdakotascrawnyData Management

Nov 29, 2012 (4 years and 4 months ago)

302 views

Eastwoods Professional College of Science and Technology

Page
1



Introduction


A database management system (DBMS) is a software package with computer
programs that control the creation, maintenance, and use of a database. It allows
organizations to conveniently develop databases for various applications by database
administrators (DBAs) and other specialists. A database is an integrated collection of data
records, files, and other objects. A DBMS allows different user application programs to
concurrently access the same database. DBMSs may use a variety of database m
odels,
such as the relational model or object model, to conveniently describe and support
applications. It typically supports query languages, which are in fact high
-
level
programming languages, dedicated database languages that considerably simplify writi
ng
database application programs. Database languages also simplify the database
organization as well as retrieving and presenting information from it. A DBMS provides
facilities for controlling data access, enforcing data integrity, managing concurrency
co
ntrol, and recovering the database after failures and restoring it from backup files, as
well as maintaining database security.


Database servers are dedicated computers that hold the actual databases and run only the
DBMS and related software. Database se
rvers are usually multiprocessor computers, with
generous memory and RAID disk arrays used for stable storage. Hardware database
accelerators, connected to one or more servers via a high
-
speed channel, are also used in
Eastwoods Professional College of Science and Technology

Page
2


large volume transaction processing e
nvironments. DBMSs are found at the heart of most
database applications. DBMSs may be built around a custom multitasking kernel with
built
-
in networking support, but modern DBMSs typically rely on a standard operating
system to provide these functions
.


Da
tabase Management Systems


A database is a collection of related files that are usually integrated, linked or
cross
-
referenced to one another. The advantage of a database is that data and records
contained in different files can be easily organized and ret
rieved using specialized
database management software called a database management system (DBMS) or
database manager.


After reading this lesson, you should be able to:




Define the term database management system (DBMS).



Describe the basic purpose and func
tions of a DBMS.



Discuss the advantages and disadvantages of DBMSs.



Eastwoods Professional College of Science and Technology

Page
3


DBMS Fundamentals

A database management system is a set of software programs that allows users to
create, edit and update data in database files, and store and retrieve data from those
database files. Data in a database can be added, deleted, changed, sorted or searched all
using a DBMS. If you were an employee in a large organization, the information about
you would likely be stored in different files that are linked together. One file
about you
would pertain to your skills and abilities, another file to your income tax status, another
to your home and office address and telephone number, and another to your annual
performance ratings. By cross
-
referencing these files, someone could chan
ge a person's
address in one file and it would automatically be reflected in all the other files. DBMSs
are commonly used to manage:




Membership and subscription mailing lists



Accounting and bookkeeping information



The data obtained from scientific researc
h



Customer information



Inventory information



Personal records



Library information



Eastwoods Professional College of Science and Technology

Page
4


DBMSs and File Management Systems

Computerized file management systems (sometimes called file managers) are not
considered true database management systems because files ca
nnot be easily linked to
each other. However, they can serve as useful data management functions by providing a
system for storing information in files. For example, a file management system might be
used to store a mailing list or a personal address book.

When files need to be linked, a
relational database should be created using database application software such as Oracle,
Microsoft Access, IBM DB2, or FileMaker Pro.


The Advantages of a DBMS


Improved availability
: One of the principle advantages of a D
BMS is that the same
information can be made available to different users.


Minimized redundancy
: The data in a DBMS is more concise because, as a general rule,
the information in it appears just once. This reduces data redundancy, or in other words,
the n
eed to repeat the same data over and over again. Minimizing redundancy can
therefore significantly reduce the cost of storing information on hard drives and other
storage devices. In contrast, data fields are commonly repeated in multiple files when a
file

management system is used.

Eastwoods Professional College of Science and Technology

Page
5



Accuracy
: Accurate, consistent, and up
-
to
-
date data is a sign of data integrity. DBMSs
foster data integrity because updates and changes to the data only have to be made in one
place. The chances of making a mistake are higher
if you are required to change the same
data in several different places than if you only have to make the change in one place.


Program and file consistency
: Using a database management system, file formats and
system programs are standardized. This makes
the data files easier to maintain because
the same rules and guidelines apply across all types of data. The level of consistency
across files and programs also makes it easier to manage data when multiple
programmers are involved.


User
-
friendly
: Data is e
asier to access and manipulate with a DBMS than without it. In
most cases, DBMSs also reduce the reliance of individual users on computer specialists
to meet their data needs.


Improved security
: As stated earlier, DBMSs allow multiple users to access the
same
data resources. This capability is generally viewed as a benefit, but there are potential
risks for the organization. Some sources of information should be protected or secured
and only viewed by select individuals. Through the use of passwords, datab
ase
management systems can be used to restrict data access to only those who should see it.

Eastwoods Professional College of Science and Technology

Page
6



The Disadvantages of a DBMS


There are basically two major downsides to using DBMSs. One of these is cost, and the
other the threat to data security.


Cost
:
Implementing a DBMS system can be expensive and time
-
consuming, especially in
large organizations. Training requirements alone can be quite costly.


Security
: Even with safeguards in place, it may be possible for some unauthorized users
to access the datab
ase. In general, database access is an all or nothing proposition. Once
an unauthorized user gets into the database, they have access to all the files, not just a
few. Depending on the nature of the data involved, these breaches in security can also
pose a

threat to individual privacy. Steps should also be taken to regularly make backup
copies of the database files and store them because of the possibility of fires and
earthquakes that might destroy the system.


Eastwoods Professional College of Science and Technology

Page
7



History


Databases have been in use since th
e earliest days of electronic computing. Unlike
modern systems, which can be applied to widely different databases and needs, the vast
majority of older systems were tightly linked to the custom databases in order to gain
speed at the expense of flexibilit
y. Originally DBMSs were found only in large
organizations with the

computer

hardware needed to support large data sets.


1960s Navigational DBMS

As computers grew in speed and capability, a number of general
-
purpose database
systems emerged; by the
mid
-
1960s there were a number of such systems in commercial
use. Interest in a standard began to grow, and Charles Bachman, author of one such
product, the Integrated Data Store (IDS), founded the "Database Task Group" within
CODASYL, the group responsible

for the creation and standardization of COBOL. In
1971 they delivered their standard, which generally became known as the "Codasyl
approach", and soon a number of commercial products based on this approach were made
available.



Eastwoods Professional College of Science and Technology

Page
8


The Codasyl approach was b
ased on the "manual" navigation of a linked data set
which was formed into a large network. When the database was first opened, the program
was handed back a link to the first record in the database, which also contained pointers
to other pieces of data. T
o find any particular record the programmer had to step through
these pointers one at a time until the required record was returned. Simple queries like
"find all the people in India" required the program to walk the entire data set and collect
the matchin
g results one by one. There was, essentially, no concept of "find" or "search".
This may sound like a serious limitation today, but in an era when most data was stored
on magnetic tape such operations were too expensive to contemplate anyway. Solutions
wer
e found to many of these problems. Prime Computer created a CODASYL compliant
DBMS based entirely on B
-
Trees that circumvented the record by record problem by
providing alternate access paths. They also added a query language that was very
straightforward.

Further, there is no reason that relational normalization concepts cannot
be applied to CODASYL databases however, in the final tally, CODASYL was very
complex and required significant training and effort to produce useful applications.


IBM also had thei
r own DBMS system in 1968, known as IMS. IMS was a
development of software written for the Apollo program on the System/360. IMS was
generally similar in concept to Codasyl, but used a strict hierarchy for its model of data
navigation instead of Codasyl's
network model. Both concepts later became known as
navigational databases due to the way data was accessed, and Bachman's 1973 Turing
Award award presentation was The Programmer as Navigator. IMS is classified as a
Eastwoods Professional College of Science and Technology

Page
9


hierarchical database.IDMS and CINCOM's T
OTAL database are classified as network
databases.


1970s relational DBMS

Edgar Codd

worked at IBM in San Jose, California, in one of their offshoot
offices that was primarily involved in the development of hard disk systems. He was
unhappy with the navigational model of the Codasyl approach, notably the lack of a
"search" facility. In 19
70, he wrote a number of papers that outlined a new approach to
database construction that eventually culminated in the groundbreaking A Relational
Model of Data

for Large Shared Data Banks.


In this paper, he described a new system for storing and working

with large
databases. Instead of records being stored in some sort of linked list of free
-
form records
as in Codasyl, Codd's idea was to use a "table" of fixed
-
length records. A linked
-
list
system would be very inefficient when storing "sparse" databases
where some of the data
for any one record could be left empty. The relational model solved this by splitting the
data into a series of normalized tables (or relations), with optional elements being moved
out of the main table to where they woul
d take up ro
om only if needed.


Eastwoods Professional College of Science and Technology

Page
10


For instance, a common use of a database system is to track information about
users, their name, login information, various addresses and phone numbers. In the
navigational approach all of these data would be placed in a single record,

and unused
items would simply not be placed in the database. In the relational approach, the data
would be
normalized

into a user table, an address table and a phone number table (for
instance). Records would be created in these optional tables only if th
e address or phone
numbers were actually provided.


Linking the information back together is the key to this system. In the relational
model, some bit of information was used as a "key", uniquely defining a particular
record. When information was being col
lected about a user, information stored in the
optional tables would be found by searching for this key. For instance, if the login name
of a user is unique, addresses and phone numbers for that user would be recorded with
the login name as its key. This "
re
-
linking" of related data back into a single collection is
something that traditional computer languages are not designed for.


Just as the navigational approach would require programs to loop in order to
collect records, the relational approach would re
quire loops to collect information about
any one record. Codd's solution to the necessary looping was a set
-
oriented language, a
suggestion that would later spawn the ubiquitous SQL. Using a branch of mathematics
known as tuple calculus, he demonstrated th
at such a system could support all the
Eastwoods Professional College of Science and Technology

Page
11


operations of normal databases (inserting, updating etc.) as well as providing a simple
system for finding and returning sets of data in a single operation.


Codd's paper was picked up by two people at Berkeley, Eugen
e Wong and
Michael Stonebraker. They started a project known as INGRES using funding that had
already been allocated for a geographical database project, using student programmers to
produce code. Beginning in 1973, INGRES delivered its first test products

which were
generally ready for widespread use in 1979. During this time, a number of people had
moved "through" the group


perhaps as many as 30 people worked on the project,
about five at a time. INGRES was similar to System R in a number of ways, inclu
ding the
use of a "language" for data access, known as QUEL


QUEL was in fact relational,
having been based on Codd's own Alpha language, but has since been corrupted to follow
SQL, thus violating much the same concepts of the relational model as SQL itse
lf.


IBM itself did one test implementation of the relational model, PRTV, and a
production one, Business System 12, both now discontinued. Honeywell did MRDS for
Multics, and now there are two new implementations: AlphoraDataphor

and Rel. All
other DBMS implementations usually called relational are actually SQL DBMSs.

In 1970, the University of Michigan began development of the MICRO
Information Management System

based on D.L. Childs' Se
t
-
Theoretic Data model.

Micro
was used to ma
nage very large data sets by the US Department of Labor, the U.S.
Environmental Protection Agency, and researchers from the University of Alberta, the
Eastwoods Professional College of Science and Technology

Page
12


University of Michigan, and Wayne State University. It ran on IBM mainframe
computers using
the Michigan
Terminal System.

The system remained in production until
1998.


Late
-
1970s SQL DBMS

IBM started working on a prototype system loosely based on Codd's

concepts as
System R in the early 1970s. The first version was ready in 1974/5, and work then started
on multi
-
table systems in which the data could be split so that all of the data for a record
(some of which is optional) did not have to be stored in a s
ingle large "chunk".
Subsequent multi
-
user versions were tested by customers in 1978 and 1979, by which
time a standardized query language


SQL


had been added. Codd's ideas were
establishing themselves as both workable and superior to Codasyl, pushing I
BM to
develop a true production version of System R, known as SQL/DS, and, later, Database 2
(DB2).

Many of the people involved with INGRES became convinced of the future
commercial success of such systems, and formed their own companies to commercialize
t
he work but with an SQL interface. Sybase, Informix, NonStop SQL and eventually
Ingres itself were all being sold as offshoots to the original INGRES product in the
1980s. Even Microsoft SQL Server is actually a re
-
built version of Sybase, and thus,
INGRES
. Only Larry Ellison's Oracle started from a different chain, based on IBM's
papers on System R, and beat IBM to market when the first version was released in 1978.

Eastwoods Professional College of Science and Technology

Page
13


Stonebraker went on to apply the lessons from INGRES to develop a new
database, Postgres, w
hich is now known as PostgreSQL. PostgreSQL is often used for
global mission critical applications (the .org and .info domain name registries use it as
their primary data store, as do many large companies and financial institutions).


In Sweden, Codd's pap
er was also read and Mimer SQL was developed from the
mid
-
70s at Uppsala University. In 1984, this project was consolidated into an
independent enterprise. In the early 1980s, Mimer in c introduced transaction handling
for high robustness in applications,
an idea that was subsequently implemented on most
other DBMS.


1980s object
-
oriented databases

The 1980s, along with a rise in object oriented programming; saw a growth in
how data in various databases were handled. Programmers and designers began to treat

the data in their databases as objects. That is to say those if a person’s data were in a
database, that person’s attributes, such as their address, phone number, and age, werenow
considered to belong to that person instead of being extraneous data. This
allows for
relations between data to be relations to objects and their attributes and not to in
dividual
fields.


Eastwoods Professional College of Science and Technology

Page
14


Another big game changer for databases in the 1980s was the focus on increasing
reliability and access speeds. In 1989, two professors from the

University of Wisconsin at
Madison published an article at an ACM associated conference outlining their methods
on increasing database performance. The idea was to replicate specific important, and
often queried information, and store it in a smaller temp
orary database that linked these
key features back to the main database. This meant that a query could search the smaller
database much quicker, rather th
an search the entire dataset.

This eventually leads to the
practice of indexing, which is used by almo
st every operating system from Windows to
the system that operates Apple iPod devices.


21st century NoSQL databases

In the 21st century a new trend of NoSQL databases was started. Those non
-
relational databases are significantly different from the classic

relational databases. They
often do not require fixed table schemas, avoid join operations by storing denormalized
data, and are designed to scale horizontally. Most of them can be classified as either key
-
value stores or document
-
oriented databases.

In r
ecent years there was a high demand for massively distributed databases with
high partition tolerance but according to the CAP theorem it is impossible for a
distributed system to simultaneously provide consistency, availability and partition
tolerance gua
rantees. A distributed system can satisfy any two of these guarantees at the
same time, but not all three. For that reason many NoSQL databases are using what is
Eastwoods Professional College of Science and Technology

Page
15


called eventual consistency to provide both availability and partition tolerance guarantees
wi
th a maximum level of data consistency.

The most popular software in that category include: memcached, Redis,
MongoDB, CouchDB, Apache Cassandra and HBase, that all are open
-
source software
products.


XML databases

A subset of NoSQL

databases are XML databases. They all use industry standard
XML data storage format. XML is open, machine
-
readable and cross
-
platform data
format widely used for interoperability among different IT systems. XML database
software market is dominated by com
mercial vendor products.

Software in this category include: Basex, Clusterpoint Server, eXist, MarkLogic
Server, MonetDB/XQuery, Oracle, Sedna.All XML databases can be attributed to
document
-
oriented databases.


Current trends

In 1998, database management
was in need of a new style of databases to solve
current database management problems. Researchers realized that the old trends of
database management were becoming too complex and there was a need for automated
configuration and management. SurajitChaudhu
ri, Gerhard Weikum and Michael
Eastwoods Professional College of Science and Technology

Page
16


Stonebraker were the pioneers that dramatically affected the thought of database
management systems.

They believed that database management needed a more modular
approach and there were too many specifications needed for user
s. Since this new
development process of database management there are more possibilities. Database
management is no longer limited to “monolithic entities”. Many solutions have been
developed to satisfy the individual needs of users. The development of nu
merous
database options has created flexibility in database management.


There are several ways database management has affected the field of technology.
Because organizations' demand for directory services has grown as they expand in size,
businesses use
directory services that provide prompted searches for company
information. Mobile devices are able to store more than just the contact information of
users, and can cache and display a large amount of information on smaller displays.
Search engine queries
are able to locate data within the World Wide Web. Retailers have
also benefited from the developments with data warehousing, recording customer
transactions. Online transactions have become tremendously popular for e
-
business.
Consumers and businesses are

able to make payments securely through some company
websites.




Eastwoods Professional College of Science and Technology

Page
17


Components



DBMS engine
accepts logical requests from various other DBMS
subsystems, converts them into physical equivalents, and actually accesses
the database and data dictionary as they ex
ist on a storage device.



Data definition subsystem

helps the user create and maintain the data
dictionary and define the structure of the files in a database.



Data manipulation subsystem
helps the user to add, change, and delete
information in a database and query it for valuable information. Software
tools within the data manipulation subsystem are most often the primary
interface between user and the information contained in a database.
It
allows the user to specify its logical information requirements.



Application generation subsystem
contains facilities to help users
develop transaction
-
intensive applications. It usually requires that the user
perform a detailed series of tasks to
process a transaction. It facilitates
easy
-
to
-
use data entry screens, programming languages, and interfaces.



Data administration subsystem

helps users manage the overall database
environment by providing facilities for backup and recovery, security
managem
ent, query optimization, concurrency control, and change
management.



Eastwoods Professional College of Science and Technology

Page
18


Modeling language

A modeling language is a data modeling language to define the schema of each
database hosted in the DBMS, according to the DBMS database model. Database
management sys
tems (DBMS) are designed to use one of five database structures to
provide simplistic access to information stored in databases. The five database structures
are:



the hierarchical model,



the network model,



the relational model,



the multidimensional model,
and



The

object model.

Inverted lists and other methods are also used. A given database management
system may provide one or more of the five models. The optimal structure depends on the
natural organization of the application's data, and on the application
's requirements,
which include transaction rate (speed), reliability, maintainability, scalability, and cost.


The hierarchical structure

The hierarchical structure was used in early mainframe DBMS. Records’
relationships form a treelike model. This struct
ure is simple but nonflexible because the
relationship is confined to a one
-
to
-
many relationship. IBM’s IMS system and the RDM
Eastwoods Professional College of Science and Technology

Page
19


Mobile are examples of a hierarchical database system with multiple hierarchies over the
same data. RDM Mobile is a newly designe
d embedded database for a mobile computer
system. The hierarchical structure is used primarily today for storing geographic
information and file systems.

Hierarchical model redirects here. For the statistics usage, see hierarchical linear
modeling.

A
hierarchical database model

is a data model in which the data is organized into
a tree
-
like structure. The structure allows representing information using parent/child
relationships: each parent can have many children, but each child has only one parent
(a
lso known as a
1
-
to
-
many relationship
). All attributes of a specific record are listed
under an entity type.



Example of a hierarchical model

Eastwoods Professional College of Science and Technology

Page
20



In a database an entity type is the equivalent of a table. Each individual record is
represented as a row, and
each attribute as a column. Entity types are related to each other
using 1:N mappings, also known as one
-
to
-
many relationships. This model is recognized
as the first database model created by IBM in the 1960s.


Currently the most widely used hierarchical d
atabases are IMS developed by IBM
and Windows Registry by Microsoft.


The Network Structure


The network structure consists of more complex relationships. Unlike the
hierarchical structure, it can relate to many records and accesses them by following one
of several paths. In other words, this structure allows for many
-
to
-
many relationships.


For computer network models, see network topology, packet generation model
and channel model.


Eastwoods Professional College of Science and Technology

Page
21


The network model is a database model conceived as a flexible way of
rep
resenting objects and their relationships. Its distinguishing feature is that the schema,
viewed as a graph in which object types are nodes and relationship types are arcs, is not
restricted to being a hierarchy or lattice.



Example of a Network Model.


The network model's original inventor was Charles Bachman, and it was
developed into a standard specification published in 1969 by the CODASYL Consortium.

Eastwoods Professional College of Science and Technology

Page
22


The relational structure


The relational structure is the most commonly used today. It is used by
mainframe, midrange and microcomputer systems. It uses two
-
dimensional rows and
columns to store data. The tables of records can be connected by common key values.
While working for IB
M, E.F. Codd designed this structure in 1970. The model is not easy
for the end user to run queries with because it may require a complex combination of
many tables.


The multidimensional structure


The multidimensional structure is similar to the relation
al model. The dimensions
of the cube
-
like model have data relating to elements in each cell. This structure gives a
spreadsheet
-
like view of data. This structure is easy to maintain because records are
stored as fundamental attributes

in the same way they
are viewed

and the structure is
easy to understand. Its high performance has made it the most popular database structure
when it comes to enabling online analytical processing (OLAP).




Eastwoods Professional College of Science and Technology

Page
23


The object
-
oriented structure


The object
-
oriented structure has the
ability to handle graphics, pictures, voice
and text, types of data, without difficultly unlike the other database structures. This
structure is popular for multimedia Web
-
based applications. It was designed to work with
object
-
oriented programming languag
es such as Java.


The dominant model in use today is the ad hoc one embedded in SQL,despite the
objections of purists who believe this model is a corruption of the relational model since
it violates several fundamental principles for the sake of practicali
ty and performance.
Many DBMSs also support the Open Database Connectivity API that supports a standard
way for programmers to access the DBMS.


Before the database management approach, organizations relied on file processing
systems to organize, store, an
d process data files. End users criticized file processing
because the data is stored in many different files and each organized in a different way.
Each file was specialized to be used with a specific application. File processing was
bulky, costly and non
flexible when it came to supplying needed data accurately and
promptly. Data redundancy is an issue with the file processing system because the
independent data files produce duplicate data so when updates were needed each separate
file would need to be up
dated. Another issue is the lack of data integration. The data is
Eastwoods Professional College of Science and Technology

Page
24


dependent on other data to organize and store it. Lastly, there was not any consistency or
standardization of the data in a file processing system which makes maintenance difficult.
For thes
e reasons, the database management approach was produced.


Data structure


Data structures (fields, records, files and objects) optimized to deal with very
large amounts of data stored on a permanent data storage device (which implies relatively
slow
access compared to volatile main memory).


Database query language

A database query language and report object allows users to interactively
interrogate the database, analyze its data and update it according to the users privileges
on data. It also control
s the security of the database. Data security prevents unauthorized
users from viewing or updating the database. Using passwords, users are allowed access
to the entire database or subsets of it called subschemas. For example, an employee
database can cont
ain all the data about an individual employee, but one group of users
may be authorized to view only payroll data, while others are allowed access to only
work history and medical data.

Eastwoods Professional College of Science and Technology

Page
25


If the DBMS provides a way to interactively enter and update the datab
ase, as
well as interrogate it, this capability allows for managing personal databases. However, it
may not leave an audit trail of actions or provide the kinds of controls necessary in a
multi
-
user organization. These controls are only available when a se
t of application
programs are customized for each data entry and updating function.


Transaction mechanism


A database transaction mechanism ideally guarantees ACID properties in order to
ensure data integrity despite concurrent user accesses (concurrency
control), and faults
(fault tolerance). It also maintains the integrity of the data in the database. The DBMS
can maintain the integrity of the database by not allowing more than one user to update
the same record at the same time. The DBMS can help preven
t duplicate records via
unique index constraints; for example, no two customers with the same customer
numbers (key fields) can be entered into the database. See ACID properties for more
information.





Eastwoods Professional College of Science and Technology

Page
26


External, logical and internal view



Traditional v
iew of data

A DBMS Provides the ability for many different users to share data and process
resources. As there can be many different users, there are many different database needs.
The question is: How can a single, unified database meet varying requiremen
ts of so
many users?

A DBMS minimizes these problems by providing three views of the database
data: an external view (or user view), logical view (or conceptual view) and physical (or
internal) view. The user’s view of a database program represents data in

a format that is
meaningful to a user and to the software programs that process those data.


Eastwoods Professional College of Science and Technology

Page
27


One strength of a DBMS is that while there is typically only one conceptual (or
logical) and physical (or internal) view of the data, there can be an endless numb
er of
different external views. This feature allows users to see database information in a more
business
-
related way rather than from a technical, processing viewpoint. Thus the logical
view refers to the way the user views the data, and the physical view
refers to the way the
data are physically stored and processed.


Features and capabilities


Alternatively, and especially in connection with the relational model of database
management, the relation between attributes drawn from a specified set of domains can
be seen as being primary. For instance, the database might indicate that a car that was
originally "red" might fade to "pink" in time, provided it was of some particular "make"
with an inferior paint job. Such higher arity relationships provide information on all of
the underlying domains at the same time, with none of them being privileged a
bove the
others.





Eastwoods Professional College of Science and Technology

Page
28


Characteristics of Databases


A computerized database refers to a collection of related files that are digitized.
More often than not, this kind of database is more useful than manila folders and filing
cabinets. For one, it provides
an efficient method of pulling facts together. It allows the
slicing, dicing, mixing, and matching of information for a myriad of purposes and needs.


After reading this lesson, you should be able to:




Identify some of the common types of databases.



Discus
s some of the key issues associated with providing data access.



Justify the importance of maintaining separate files.



Justify the importance of minimizing redundancy between data files.


Types of Databases


Some databases are small enough to be created and

contained on your desktop
computer while others are so large that they are stored on network servers or powerful
mainframe computers. Popular database management software applications such as
Paradox, Access, and dBASE 5 are utilized to manage databases s
mall enough to be
stored on a desktop computer. Individuals use these programs to

perform specific tasks,
such as to keep track of customers and manage data for small research projects.

Eastwoods Professional College of Science and Technology

Page
29



Some databases are so large that that they must be stored on a server

or mainframe
computer and accessed by going online. Some large, public databases can be accessed
online for a fee. These are referred to as information utilities or online services. You may
have heard of or used some of the more popular online services in
cluding America
Online, CompuServe, and Microsoft Network. These online services provide access to a
myriad of information sources concerning weather, news, travel, shopping, and a great
deal more. Even specialized public databases can be accessed online.
Lexis, which gives
lawyers access to local, state, and federal laws, is just one example. There are many other
types of large databases. Many museums have put artwork online, creating virtual art
museums. Most university libraries have created electronic d
atabases to compliment or
substitute for their card catalogues.


Database Access


Database access is a sticky issue, as you will see. The following example
illustrates some of the difficulties that data administrators, organizations, and society in
general

now face. A decade ago, Congress created a medical practitioner database to keep
physicians disciplined by the medical board of one state from avoiding detection if they
moved to another state and applied for a medical license. Should doctor databases be
opened to the public? If given access to the database, patients could look up information
about a specific

doctor and find out if other patents have lodged complaints against them.

Eastwoods Professional College of Science and Technology

Page
30



In one case, a women whose obstetrician left unsightly scars on her abdome
n after
delivering her baby said if she had been allowed access to the database, she would have
learned of other patients' complaints and chosen another doctor. On the other hand, many
physicians complain that by making such data available, they are less l
ikely to perform
high
-
risk procedures, even when it might be beneficial to the patient. Those doctors
performing high
-
risk procedures are more likely to receive complaints and could
potentially face disciplinary action. You can begin to see the challenges
associated with
determining who should have access to certain types of information.


Database Attributes for Effective Use


It is important to keep some database files separate, even though they contain closely
related information. For example, it's usuall
y a good idea to keep employee files
containing home address, telephone number, job title, and work location separate from
files containing an employee's tax and salary information. There are at least two reasons
for maintaining these records in separate f
iles:


It is generally more efficient and effective to search for and extract information from
smaller sets of data. In other words, users can access data more rapidly by using smaller
files than by trying to access the same data in a large composite file

containing vast
amounts and types of data. The more types of data contained in a database, the more
Eastwoods Professional College of Science and Technology

Page
31


complex the database becomes and the more difficult it is for database management
systems to manipulate it accurately and efficiently.

Different types of da
ta should be accessible to different groups of people. For example,
all employees may be given access to employee information such as work location, job
title, and home telephone number. Tax deduction and salary information might only be
made available to
human resource personnel and the accounting department in an
organization. Different functional groups in an organization require access to different
types of data. This makes sense when you consider the need to maintain some degree of
security and persona
l privacy.


Multiple Sources


A database is more useful if there is little redundancy between the files it contains. In
other words, it would be inefficient and a waste of human and computer resources to have
the same information repeated over and over aga
in in different files. Some companies
maintain databases with very similar information. Sometimes there are good reasons for
this; e.g. for security purposes. However, it's simply more costly to maintain accurate
information in multiple locations. In addit
ion, there would also be a need to resolve
discrepancies occurring between the same information in multiple files.



Eastwoods Professional College of Science and Technology

Page
32


One of the beauties of databases is the ability to link together data from multiple sources
to accomplish a specific task. For example, I m
ight store the file containing a mailing list
for Pennsylvania with similar lists compiled for individuals in the other fifty states. If a
political action group in Pennsylvania decides to develop a campaign for the northeast
region, they can extract the n
ames of potential supporters for the states of New York,
Connecticut, Maine, and other northeastern states.


Simple definition


A database management system is the system in which related data is stored in an
efficient or compact manner. "Efficient" means
that the data which is stored in the DBMS
can be accessed quickly and "compact" means that the data takes up very little space in
the computer's memory. The phrase "related data" means that the data stored pertains to a
particular topic.


Specialized
databases have existed for scientific, imaging, document storage and
like uses. Functionality drawn from such applications has begun appearing in mainstream
DBMS's as well. However, the main focus, at least when aimed at the commercial data
processing mark
et, is still on descriptive attributes on repetitive record structures.


Eastwoods Professional College of Science and Technology

Page
33


Thus, the DBMSs of today roll together frequently needed services and features
of attribute management. By externalizing such functionality to the DBMS, applications
effectively share

code with each other and are relieved of much internal complexity.
Features commonly offered by database management systems include:


Query ability


Querying is the process of requesting attribute information from various
perspectives and combinations of

factors. Example: "How many 2
-
door cars in Texas are
green?" A database query language and report writer allow users to interactively
interrogate the database, analyze its data and update it according to the users privileges
on data.


Backup and replicati
on


Copies of attributes need to be made regularly in case primary disks or other
equipment fails. A periodic copy of attributes may also be created for a distant
organization that cannot readily access the original. DBMS usually provide utilities to
faci
litate the process of extracting and disseminating attribute sets. When data is
replicated between database servers, so that the information remains consistent
Eastwoods Professional College of Science and Technology

Page
34


throughout the database system and users cannot tell or even know which server in the
DBMS they
are using, the system is said to exhibit replication transparency.


Rule enforcement


Often one wants to apply rules to attributes so that the attributes are clean and
reliable. For example, we may have a rule that says each car can have only one engine
a
ssociated with it (identified by Engine Number). If somebody tries to associate a second
engine with a given car, we want the DBMS to deny such a request and display an error
message. However, with changes in the model specification such as, in this exampl
e,
hybrid gas
-
electric cars, rules may need to change. Ideally such rules should be able to be
added and removed as needed without significant data layout redesign.


Security


For security reasons, it is desirable to limit who can see or change specific
attributes or groups of attributes. This may be managed directly on an individual basis, or
by the assignment of individuals and privileges to groups, or (in the most elaborate
models) through the assignment of individuals and groups to roles which are the
n granted
entitlements.

Eastwoods Professional College of Science and Technology

Page
35


Computation


Common computations requested on attributes are counting, summing, averaging,
sorting, grouping, cross
-
referencing, and so on. Rather than have each computer
application implement these from scratch, they can rely on t
he DBMS to supply such
calculations.


Change and access logging


This describes who accessed which attributes, what was changed, and when it was
changed. Logging services allow this by keeping a record of access occurrences and
changes.


Automated optimization

For frequently occurring usage patterns or requests, some DBMS can adjust
themselves to improve the speed of those interactions. In some cases the DBMS will
merely provide tools to monitor performance, allowing a human expert to mak
e the
necessary adjustments after reviewing the statistics collected.


Eastwoods Professional College of Science and Technology

Page
36


Meta
-
data repository


Metadata is data describing data. For example, a listing that describes what
attributes are allowed to be in data sets is called "meta
-
information".


Advanced DBMS


An example of an advanced DBMS is Distributed Data Base Management System
(DDBMS), a collection of data which logically belong to the same system but are spread
out over the sites of the computer network. The two aspects of a distributed database are
distribution and logical correlation:




Distribution: The fact that the data are not resident at the same site, so that we can
distinguish a distributed database from a single, centralized database.




Logical Correlation: The fact that the data have some pro
perties which tie them
together, so that we can distinguish a distributed database from a set of local
databases or files which are resident at different sites of a computer network.



Eastwoods Professional College of Science and Technology

Page
37


Types of database engines

Embedded database

An embedded database system is a database management system (DBMS) which is
tightly integrated with an application software that requires access to stored data, such
that the database system is “hidden” from the application’s end
-
user and requires little o
r
no ongoing maintenance. It is actually a broad technology category that includes database
systems with differing application programming interfaces (SQL as well as proprietary,
native APIs); database architectures (client/server and in
-
process); storage
modes (on
-
disk, in
-
memory and combined); database models (relational, object
-
oriented, Entity
-
Attribute
-
Value model and network/CODASYL); and target markets. The term
"embedded database" can be confusing because only a small subset of embedded
database pro
ducts is used in real
-
time embedded systems such as telecommunications
switches and
consumer electronics devices.


In
-
memory database

An in
-
memory database (IMDB; also main memory database system or MMDB) is a
database management system that primarily reli
es on main memory for computer data
storage. It is contrasted with database management systems which employ a disk storage
mechanism. Main memory databases are faster than disk
-
optimized databases since the
internal optimization algorithms are simpler and
execute fewer CPU instructions.
Accessing data in memory reduces the I/O reading activity when querying the data which
Eastwoods Professional College of Science and Technology

Page
38


provides faster and more predictable performance than disk. In applications where
response time is critical, such as telecommunications n
etwork equipment and mobile ads
networks, main mem
ory databases are often used.


The Value of Data and Databases


Many of the actions you make during the day become data for organizations to use for
their own profit and learning. Using an automated teller
machine, filling out a form for a
driver's license, ordering a book on the Internet, booking a flight on an airline
-

all
become digitized data to be sorted, managed, and used by others. In each of these cases,
someone at some time has decided how the data

from these users will be received, stored,
processed, and made available to others.


After reading this lesson, you should be able to:



Describe the value of data to organizations.



Discuss how and why organizations and individuals attempt to extract
meaning
from data.



Data and Organizations


Eastwoods Professional College of Science and Technology

Page
39


For financial and/or legal reasons, organizations collect and store vast amounts of data
about employees, customers, finances, vendors, inventory, competitors, and markets, to
name only a few. The amount of data n
eeded is important because people generally make
better decisions if they have more data available to them.


For example, a car dealership, bank, or credit union will make better decisions about who
to give car loans by looking at a person's credit report
information than if they simply
based their decision on the word of the customer. Looking at your credit report, a bank
representative would see a listing of your payment history on loans and credit cards,
including your mortgage. She would also see inform
ation about outstanding loans, debt
repayment and credit limits. The report may also contain information about jobs you have
held and public record information (birth date and address).


Likewise, a factory will improve its ability to manufacture products
by tracking and
managing data about inventory (name, identification number, location, and quantity),
production schedule, quality control measures, and much more. You can begin to see why
collecting data is important. However, the true value of data cannot

be realized until it is
appropriately organized, stored, analyzed, and eventually used for a specific purpose.



Eastwoods Professional College of Science and Technology

Page
40


Extracting Meaning from Data


Raw data is not very useful. Suppose a human resources manager of a local hospital
sends out a survey consisting

of 25 multiple
-
choice questions to assess the level of
employee satisfaction of its 150 nurses. Let's assume for a moment that 114 surveys are
completed and returned to the manager. This is the raw data and basically has no
meaning.


As a next step, the r
esponses of each nurse to each question on the survey are entered and
stored in a computer. The data is still raw and meaningless. It becomes more organized if
it is entered into a computer with a plan and purpose in mind. If the manager is smart, he
will
assign each a nurse an ID number and enter all of his or her responses, not at
random, but in the order in which they appear in the survey.


Ultimately, the data cannot be understood until it is analyzed. This can be accomplished
by calculating the average

score for each nurse, the average score for all the nurses at the
hospital, the average score for the nurses in each department, and so on. As the manager
begins to process and analyze the data, it eventually begins to tell a story. Hopefully, the
story w
ill increase understanding in a way that enables the manager to improve the level
of satisfaction of the group of employees.


Eastwoods Professional College of Science and Technology

Page
41


Understanding Database

Terminology


A computer cannot process data unless it is organized in special ways; into characters,
fields, records, files and databases.

After reading this lesson, you should be able to:




Define the key terms needed to understand what a database is and how it is used.



Identify the purpose and role of characters in data processing.



Identify the purpose a
nd role of fields in data processing.



Identify the purpose and role of records in data processing.



Identify the purpose and role of database files in data processing.



Identify the purpose and role of databases in data processing.



Identify the purpose and r
ole of data management systems in data processing.



Identify the purpose and role of keys in data processing.


Character


A character is the most basic element of data that can be observed and manipulated.
Behind it are the invisible data elements we call bits and bytes, referring to physical
Eastwoods Professional College of Science and Technology

Page
42


storage elements used by the computer hardware. A character is a single symbol such
as a
digit, letter, or other special character (e.g., $, #, and ?).


Field




A field contains an item of data; that is, a character, or group of characters that are
related. For instance, a grouping of related text characters such as "John Smith" makes
up
a name in the name field. Let's look at another example. Suppose a political action group
advocating gun control in Pennsylvania is compiling the names and addresses of potential
supporters for their new mailing list. For each person, they must identify

the name,
address, city, state, zip code and telephone number. A field would be established for each
type of information in the list. The name field would contain all of the letters of the first
and last name. The zip code field would hold all of the digi
ts of a person's zip code, and
so on. In summary, a field may contain an attribute (e.g., employee salary) or the name of
an entity (e.g., person, place, or event).


Eastwoods Professional College of Science and Technology

Page
43


Record





A record is composed of a group of related fields. As another way of saying it
, a
record contains a collection of attributes related to an entity such as a person or product.
Looking at the list of potential gun control supporters, the name, address, zip code and
telephone number of a single individual would constitute a record. A p
ayroll record
would contain the name, address, social security number, and title of each employee.









Eastwoods Professional College of Science and Technology

Page
44


Database File




As we move up the ladder, a database file is defined as a collection of related
records. A database file is sometimes called a
table. A file may be composed of a
complete list of individuals on a mailing list, including their addresses and telephone
numbers. Files are frequently categorized by the purpose or application for which they
are intended. Some common examples include mai
ling lists, quality control files,
inventory files, or document files. Files may also be classified by the degree of
permanence they have. Transition files are only temporary, while master files are much
more long
-
lived.




Eastwoods Professional College of Science and Technology

Page
45


Database

Organizations and
individuals use databases to bring independent sources of data together
and store them electronically. Thus, a database is composed of related files that are
consolidated, organized and stored together. One collection of related files might pertain
to empl
oyee information. Another collection of related files might contain sports
statistics.

Organizations and individuals may have and use many different databases, depending on
the nature of the work involved. For example, a library database might consist of s
everal
related, but separate, databases including book titles and author names, book description,
books on order, books checked out, and similar sets of information. Most organizations
have product information databases, customer databases, and human resou
rce databases
that contain information about employees, salaries, home address, stock purchase plans,
and tax deduction information. In each case, the data stored in a database is independent
from the application programs which use and process the data.






Eastwoods Professional College of Science and Technology

Page
46


Data Management System

Data management systems are used to access and manipulate data in a database. A
database management system is a software package that enables users to edit, link, and
update files as needs dictate. Database management systems will

be discussed in greater
detail in another lesson.

Key


In order to track and analyze data effectively, each record requires a unique
identifier or what is called a key. The key must be completely unique to a particular
record just as each individual has
a unique social security number assigned to them. In
fact, social security numbers are often used as keys in large databases. You might think
that the name field would be a good choice for a key in a mailing list. However, this
would not be a good choice b
ecause some people might have the same name. A key must
be identified or assigned to each record for computerized information processing to
function correctly. An existing field may be used if the entries are entirely unique, such
as a social security or t
elephone number. In most cases, a new field will be developed to
hold a key, such as a customer number or product number.

Eastwoods Professional College of Science and Technology

Page
47


Conclusion



As you can see, from learning about the database system much of what we have

discussed can be very useful in applying it to our everyday lives. The database
management system can be a very useful asset in the business world or in the lives of
people every day. Furthermore because it has the ability to increase networking and
mobil
ity, the database system will be a great benefit for communication methods among
the different businesses. After learning about the database management, it would be of
great assistance when people become more aware of what they are using in the business
or

in their everyday life.











Eastwoods Professional College of Science and Technology

Page
48


Bibliography


^ Codd, E.F. (1970)."A Relational Model of Data for Large Shared Data Banks". In:
Communications of the ACM 13 (6): 377

387.

^ "A set theoretic data structure and retrieval language" (PDF), William R. Hershe
y and
Carol H. Easthope, Paper from the Session on Data Structures, Spring Joint Computer
Conference, May 1972 in ACM SIGIR Forum, Volume 7, Issue 4 (December 1972), pp.
45
-
55, DOI=10.1145/1095495.1095500

^ "Sets, Data Models and Data Independence", by Ken

North a Dr. Dobb's Blogger,
March 10, 2010

^ Description of a set
-
theoretic data structure, D. L. Childs, 1968, Technical Report 3 of
the CONCOMP (Research in Conversational Use of Computers) Project, University of
Michigan, Ann Arbor, Michigan, USA

^ Fea
sibility of a Set
-
Theoretic Data Structure : A General Structure Based on a
Reconstituted Definition of Relation, D. L. Childs, 1968, Technical Report 6 of the
CONCOMP (Research in Conversational Use of Computers) Project, University of
Michigan, Ann Arbor
, Michigan, USA

^ MICRO Information Management System (Version 5.0) Reference Manual, M.A.
Kahn, D.L. Rumelhart, and B.L. Bronson, October 1977, Institute of Labor and Industrial
Relations (ILIR), University of Michigan and Wayne State University

Eastwoods Professional College of Science and Technology

Page
49


^ Develop
ment of an object
-
oriented DBMS; Portland, Oregon, United States; Pages: 472


482; 1986; ISBN 0
-
89791
-
204
-
7

^ Performance enhancement through replication in an object
-
oriented DBMS; Pages 325

336; ISBN 0
-
89791
-
317
-
5

^ Seltzer, M. (2008, July). Beyond Rela
tional Databases. Communications of the ACM,
51(7), 52

58. Retrieved July 6, 2009, from Business Source Complete database.

^ itl.nist.gov (1993) Integration Definition for Information Modeling (IDEFIX). 21
December 1993.

Website:
http://en.wikipedia.org/wi
ki/Database_management_system