Chapter 12 Designing Databases

burpfancyΗλεκτρονική - Συσκευές

8 Νοε 2013 (πριν από 3 χρόνια και 9 μήνες)

123 εμφανίσεις

149

Chapter 12

Designing Databases



Chapter Objectives


Chapter 12 introduces students to database design, discussing both logical and physical database
design. During logical design, logical data models are created for each known user interface, the
logic
al models for each interface are integrated into a consolidated logical database model, the
application’s conceptual E
-
R data model is translated into normalized data requirements, and the
logical database model is then integrated with the translated E
-
R m
odel. During physical design,
decisions about data types, data structures, file organizations, and media are made.


This chapter introduces the relational data model, the most common notation used for representing
detailed data requirements necessary for
database design. Concepts of the relational data model,
normalization principles for creating relational models with desirable properties, a process for
combining different relational data models into a consolidated relational data model, and how to
trans
late an entity
-
relationship data model into a relational data model are presented.


Chapter 12 reviews several choices systems builders have for the design of physical data storage
structures. Chapter 12 emphasizes those decisions for which a systems an
alyst is most likely to have
input, as opposed to the very technical data structure decisions made by database administrators and
analysts. You should emphasize to your students throughout this section that physical design issues
are addressed as part of
a team of system analysts and technology specialists. Systems analysts bring
an in
-
depth understanding of the application to the table, whereas technology specialists intimately
understand the relative efficiency, security, integrity, and reliability of d
ifferent technologies in
different settings. To communicate with these specialists, systems analysts must have a sound
knowledge of physical design issues, which is accomplished from studying the chapters in this
section. Some students are attracted to t
he more technical topics (possibly because of prior exposure
to computer science topics), and others are interested in the more methodological and managerial
issues. Thus, another challenge of teaching Chapter 12, as well as Chapters 15 and 16, is keeping

a
balance of business and technical issues so that you hold the attention of all students.


Chapter 12 provides a transition from typical systems analysis topics to data analysis topics; these
topics are often discussed in database management courses. Yo
u should coordinate the teaching of
this chapter with database management course faculty so that purposeful redundancy occurs and
important topics are not missed across the courses in your curriculum.



Instructional Objectives


Specific student learning o
bjectives are included at the beginning of the chapter.
From an
instructor’s point of view, the objectives of this chapter are to
:



1.

Show the relationship between systems analysis and design and database design. The
philosophy of this textbook is that

database design is a topic of joint interest between systems
analysts and database specialists. In most cases, considerable interaction between conceptual
and logical database design exists, combining the top
-
down approach of conceptual data

Modern Systems Analysis and Design, 3
rd

edition

Instructor’s Manual


150

modeling wit
h a bottom
-
up approach using logical data modeling tools. Application area
knowledge and enterprise database knowledge (often found in a data administration group) are
necessary to converge to a viable logical design for an application’s database. Rememb
er, an
application’s logical database design does not imply a separate physical database for that
application, but rather only a separate view of data, which may be part of a more comprehensive
physical database.



2.

Show the relationship between logical
database design and physical database design. An
important point under this objective is that it is the job of the analyst to capture during prior
systems development phases
--
especially analysis and logical design
--
all of the parameters
needed to make phy
sical system design decisions. Although some information is not necessary
for the techniques of prior stages, this information (such as field length, data integrity
requirements, and an estimated number of entity instances) is essential for physical syste
m
design.



3.

Present the relational data model as a logical data model that captures the structure of data in a
very fundamental, stable form and suggests ways to organize data during physical database
design, resulting in desirable data maintenance prop
erties (which avoid certain data anomalies).



4.

Show students how a conceptual data model can obscure some details about data requirements
that must be better understood in order to perform physical database design.



5.

Show students, using an example f
rom Hoosier Burger, how to translate a conceptual data model
into a logical data model and how to incorporate the data requirements of specific system outputs
into the process of forming a logical data model.



6.

Improve the ability of students who will t
ake systems analyst jobs to communicate with technical
specialists on systems development teams.



7.

Emphasize the distinction between logical and physical system design by coverage of such
topics as denormalization; such topics clarify the different obje
ctives of logical and physical
design (which is basically semantic richness of describing information requirements versus
efficient and secure data processing).



8.

Discuss database design for Internet
-
based electronic commerce applications.



Classroom I
deas



1.

This chapter, like Chapter 10, covers a topic addressed in most database management courses.
Depending on your curriculum, this chapter may review previously covered material or
introduce material covered (in more depth) in a subsequent course
. However, logical database
design is not strictly a database topic, but is essential for thorough systems analysis, thus it is an
activity that should not be assigned to only specialists (database analysts). Although you are
strongly encouraged to cover

this chapter in your systems analysis and design course, you should
coordinate how you address this topic with those who teach database courses. Chapter 12 is
carefully written for the systems analysis and design student. This chapter is an excellent
re
fresher for students who have studied the relational data model and normalization and provides
a solid introduction to these topics for those students who will address this topic later in a
database management course.

Chapter 12

Designing Da
tabases: Logical Data Modeling


151



2.

Emphasize to your students that
logical data modeling is still technology independent. A logical
data model is not necessarily mapped on a one
-
to
-
one basis into a physical database design. The
purpose of logical database modeling is to prepare the description of stored data requirement
s
into a format that makes it easier for physical database design decisions to be made. Students
may want to go directly from conceptual data modeling to physical database design, so spend
some time discussing the purpose of logical database modeling.



3.

Review the key steps of logical database modeling. It is important that novice data modelers
understand that all four steps are necessary to produce a thorough logical data model as input to
physical database design.



4.

Understanding the relationa
l data model is critical for doing logical database modeling. The
relational model is fairly simple, and since most students have experience with a PC database
management system, this model is intuitive for them. Emphasize the five properties of relation
s
and the concept of anomalies.



5.

You can introduce normalization to your students from two perspectives. One approach is to
introduce first, second, and third normal forms and teach your students to transform
unnormalized data into third normal for
m by stepping through each normal form in sequence.
Another approach is to emphasize functional dependencies and determinants (see Problem and
Exercise 6). Use whichever approach is most comfortable for you; this chapter supports either
approach.



6.

C
onsider spending a significant portion of your class periods (allocated to Chapter 12) working
problems that show how to translate between E
-
R and relational data models. Table 12
-
1 is a
compact summary of how to map E
-
R constructs into relational constru
cts. Your students
should become competent with translating in either direction. Problems and Exercises 2, 3, 4,
and 6 are suitable for in
-
class exercises, but you should create other examples. Work a few
examples for your students, and then have your s
tudents work (either individually or in small
teams) on several problems in class and then present their answers. Practice is the best teacher
of both normalization and translating between relational and E
-
R models.



7.

Emphasize to your students that m
ost E
-
R models developed during analysis are incomplete
since system inputs and outputs are not designed in detail until logical design. Use this
explanation to motivate the need for view integration. Be sure to discuss the potential pitfalls
(view integ
ration problems) that make view integration more than a mechanical process. Again,
use many examples; Problem and Exercise 4 is a fairly simple one that students can work on
inside or outside of class. Ask your students why view integration problems aris
e; for example,
sample problems include independent analysts or project teams of different subsystems with
slightly different data semantics, the integration of multiple independently
-
developed
applications coming together to create an enterprise data mode
l, and imprecision or lack of
naming standards by analysts.



8.

You should review in class the Hoosier Burger example found in the “Logical Database Design
For Hoosier Burger” section. This example illustrates how to deal with both translating an E
-
R
m
odel into 3NF relations as well as integrating specific system requirements into the data model
during logical database design.



Modern Systems Analysis and Design, 3
rd

edition

Instructor’s Manual


152


9.

You should emphasize to your students that physical database design is technology dependent,
as well as logical requiremen
ts dependent. To make physical data storage decisions, one must
know what constraints and opportunities available technologies have. What file organizations
do the available database management systems have? What data types do the operating system
and o
ther system software support? What are the physical characteristics of secondary memory
devices and what overhead space does data management software require? These are questions
that technical specialists can answer; systems analysts, with application k
nowledge, along with
technical specialists armed with answers to these questions can together perform physical file
and database design.


10.

Discuss with your students the role of the CASE repository as a central source of information
necessary to make p
hysical design decisions and as a place where these decisions are stored.
You can also point out the interactive nature of some logical and physical design issues, such as
choosing field data types, and that a CASE repository helps to synchronize logical
and physical
design activities. For example, when prototyping computer displays or reports with many design
tools, analysts may have to select the data type and length of fields before these fields are placed
on the prototype. If this display or report g
enerator is a module of the CASE tool, then it draws
on the CASE repository for necessary information in building the user interface.


11.

Emphasize the important role analysts play in designing data integrity controls. Analysts are
essential because des
igners need an in
-
depth knowledge of the business area and application to
choose default values, picture formats, and range constraints; to determine whether null values
should be allowed; and if a null value is present, how to process the data.


12.

Revi
ew the types of file organizations. Discuss the seven factors that analysts consider when
selecting a file organization. Have students evaluate each file organization with respect to each
of the seven factors. Finally, review Table 12
-
3 as a way summari
ze the discussion.


13.

Often when students prepare system development projects for the systems analysis and design
course, they do not include adequate file and database controls into the system. Call your
students’ attention to the section “Designing C
ontrols for Files” and emphasize that controls are
essential elements of a system and must be designed into the system, not added as an
afterthought.




Chapter 12

Designing Da
tabases: Logical Data Modeling


153

Answers to Key Terms


Suggested answers are provided below. These answers are presented top
-
down, l
eft to right.


21.

Relation



3.

Default value

28.

Well
-
structured relation


14.

Null value

13.

Normalization


16.

Physical table


8.

Functional dependency



4.

Denormalization

23.

Second normal form (2NF)


15.

Physical file

27.

Third normal f
orm (3NF)



6.

File organization


7.

Foreign key


17.

Pointer

20.

Referential integrity


25.

Sequential file organization

19.

Recursive foreign key


12.

Indexed file organization

26.

Synonyms


11.

Index

10.

Homonym


24.

Secondary key


5.

Fie
ld



9.

Sequential file organization


2.

Data type


18.

Primary key


1.

Calculated (or computed or derived) field


22.

Relational database model



Answers to Review Questions





1.

The purpose of normalization is to rid relations of anomalies. The
goal is to form well
-
structured relations that are simple and stable when data values change or data are added or
deleted.



2.

The five properties of relations are entries in columns are simple, entries in columns are from the
same set of values, each row

is unique, the sequence of columns is insignificant, and the
sequence of rows is insignificant.



3.

Synonyms, homonyms, transitive dependencies, and class/subclass relationships can arise during
view integration. Synonyms occur when two or more differen
t names are used for the same
attribute from different user views. Homonyms occur when two or more attributes from
different user views have the same name. Transitive dependencies are functional dependencies
between non
-
key attributes that arise when fun
ctionally dependent non
-
keys come from different
user views. Class/subclass relationships refer to relations representing the same entity from
different user views actually represent different subsets of the same entity type.



4.

Relationships between en
tities are represented in several ways in the relational data model. A
binary 1:
M

relationship is represented by placing a foreign key (the primary key of the entity on
the one side of the relationship) in the relation for the entity on the many side of t
he relationship.
In a binary 1:1 relationship, a foreign key is placed in the relation on either side of the
relationship or on both sides. For a binary and higher degree
M
:
N

relationship, a relation is
created with a primary key, which is the concatenat
ion of the primary keys from the related
entities. In a unary relationship, a recursive foreign key is added to the relation.



Modern Systems Analysis and Design, 3
rd

edition

Instructor’s Manual


154


5.

The fundamental rule of normalization is that each non
-
key attribute must be fully functionally
dependent on the whole prim
ary key attribute (a non
-
key is dependent on the whole key and
nothing but the key). Thus, there can be no functional dependencies between non
-
keys.




6.

A foreign key is identified by using a dashed underline.



7.

Instances in a relation cannot prove t
hat a functional dependency exists; however, you can use
sample data to demonstrate that a functional dependency does not exist. The sample data does
not show you every possible instance, only a sampling. Knowledge of the problem domain is a
reliable met
hod for identifying functional dependencies.



8.

The choice of data type often limits the possible values that may be stored for a field. For
example, a numeric data type forbids alphabetic characters. Some data types have an assumed
length (e.g., SMA
LLINT) that places an implicit range control on values. Data type may also
limit the kinds of data manipulations possible, thus further controlling the integrity of the data or
results from manipulating the data. For example, a DATE data type causes addi
tion and
subtraction to be limited by rules about dates.



9.

A referential integrity control requires the data management software to access other data records
to determine if the value is permitted, whereas a range control is checked by looking up valu
es
outside the files and database, in a repository or other source of metadata.


10.

The purpose of denormalization is to physically locate data close to one another if they are often
needed together for processing, thus minimizing secondary memory I/O op
erations. Because
normalization forces all attributes dependent on the same primary key to be logically placed in
one relation, a relation can become quite diverse with many attributes. For example, a PART
relation might have attributes related to engine
ering, production, accounting, and marketing. It
is likely that in only rare instances attributes from two or more of these areas are needed in the
same data processing steps. If the attributes were stored in the same physical record, the record
would be

long and it would take more time to access this file than if the file were divided into
segments and each segment had the same primary key and only fields for those attributes used
together.


11.

The factors that influence the decision to create an index
are the data retrieval, insertion,
deletion, and updating costs with and without the index. Indexes allow for rapid random
retrieval and sorting of data, but indexes create additional storage and maintenance costs.


12.

Data compression techniques are pa
ttern matching and other methods that replace repeating
strings of characters with codes of shorter lengths, thus reducing data storage requirements.


13.

The two goals of designing physical tables are the efficient use of secondary storage and data
proces
sing speed.


14.

The seven factors to considered when selecting a file organization are: (1) fast data retrieval, (2)
high throughput for processing transactions, (3) efficient use of storage space, (4) protection
from failures or data loss, (5) minimizi
ng need for reorganization, (6) accommodating growth,
and (7) security from unauthorized use.



Chapter 12

Designing Da
tabases: Logical Data Modeling


155

Answers to Problems and Exercises



1.

The wording in the problem clarifies some of the relationships and associated cardinalities.
Since VENDOR is functionall
y dependent on COMPNAME, there can be at most one vendor
for each component (and we assume that there may be no vendor for a given component). Also,
since COMPNAME is functionally dependent on PRODNAME, there can be at most one
component per product (a ra
ther odd situation, but that is what the wording says; we assume that
some products have no components). Although not clarified in the problem, we assume that a
product is assigned to exactly one salesperson, while a salesperson can be assigned one
-
to
-
man
y
products. Given these clarifications, the 3NF relations are (foreign keys are in italics):


PRODUCT (
PRODNAME
,
SALESPERSON
,
COMPNAME
)

SALESPERSON (
SALESPERSON
)

COMPONENT (
COMPNAME
,
VENDOR
)

VENDOR (
VENDOR
)



2.

Listed below are sample 3NF relations for t
he conceptual data model diagram in Figure 10
-
3.
We have created a few, representative attributes to make this normalization meaningful (foreign
keys are in italics). The SHIPMENT and PRODPLAN relations contain nonkey attributes,
whereas the PRODITEM and

SUPPITEM relations, the result of the many
-
to
-
many
relationships, have no nonkey attributes, since these are not shown as associative entities in
Figure 10
-
3. We have not, at this point, created proper primary keys.


SUPPLIER (
SUPPNAME
, SUPPADDRESS)

SHIP
MENT (
SHIPID
,
SUPPNAME
,
ITEMNAME
, SHIPDATE)

ITEM (
ITEMNAME
, ITEMDESC)

SUPPITEM (
SUPPNAME
,
ITEMNAME
)

PRODUCT (
PRODNAME
, PRODDESC)

PRODITEM (
PRODNAME
,
ITEMNAME
)

PRODPLAN (
PRODPLANNO
,
ITEMNAME
,
MASTSCHEDNO
, QUANTITY)

MASTSCHED (
MASTSCHEDNO
,
PRODNAME
, MASTSCHE
DDATE)



3.

Listed below are 3NF relations for the E
-
R diagram on Figure 12
-
21. Foreign keys are in italics.
QUOTE QUANTITY is not sufficient as the primary key of the PRICEQUOTE relation, but this
attribute along with the primary keys of the two associat
ed entities is a sufficient concatenated
(or composite) primary key for this associative entity. The PART RECEIPT entity is called a
weak or attributive entity, since its existence depends on a PRICE QUOTE entity instance. In
this case, however, the PART

RECEIPT entity has its own primary key, ORDER NO. The 3NF
relations are:


VENDOR (
VENDOR NO
, ADDRESS)

PRICEQUOTE (
VENDOR NO
,
ITEM NO
,
QUOTE QUANTITY
, PRICE)

PART (
ITEM NO
, DESC)

PART RECEIPT (
ORDER NO
,
VENDOR NO
,
ITEM NO
,
QUOTE QUANTITY
, DATE,
ORDER QUAN
TITY)



4.

Listed below are merged 3NF relations for this hospital example. This is an interesting exercise
because it points out how semantically lacking the relational data model is, since questions arise
about functional dependencies across separately
developed relations. One observation is clear:
The second 3NF PATIENT relation in this exercise has only one value of TREATMENT

Modern Systems Analysis and Design, 3
rd

edition

Instructor’s Manual


156

DESCRIPTION, so each patient must be associated with only one treatment, otherwise this
relation would not be in 3NF. But, we
must also assume that each department has only one
supervisor and each supervisor can supervise only one department. This last assumption means
that we could create only a supervisor or a department relation, but not both. This is sufficient
because, if
the original set of six 3NF relations is comprehensive, there are no nonkey attributes
dependent on either DEPARTMENT or SUPERVISOR ID. We, however, create both
supervisor and department relations, with a one
-
to
-
one relationship between them, to allow for

some evolution of the data model. One additional assumption about supervisors:
SUPERVISOR is a separate entity from PHYSICIAN. We also assume that the attribute
ADDRESS means the same address in both PATIENT relations, and further there are no other
sy
nonyms or homonyms across the relations. Interestingly, there is no relationship between
patient and physician implied in the original 3NF relations, and we assume none exists. With
these assumptions, the merged relations are (foreign keys are in italics
):


PATIENT (
PATIENT NO
, ADDRESS,
ROOM NO
, ADMIT DATE,
TREATMENT ID
)

ROOM (
ROOM NO
, PHONE, DAILY RATE)

PHYSICIAN (
PHYSICIAN ID
, NAME,
DEPARTMENT ID
)

TREATMENT (
TREATMENT ID
, DESCRIPTION, COST)

SUPERVISOR (
SUPERVISOR ID
,
DEPARTMENT ID
)

DEPARTMENT (
DEPARTMEN
T ID
,
SUPERVISOR ID
)





To create the E
-
R diagram from these 3NF relations, we have to make additional assumptions
about minimum cardinalities. We assume that every patient is assigned a room, but a room may
be empty; not all treatments have to be associ
ated with a patient, but a patient has to have a
treatment; and that each department has one supervisor and each supervisor has one department.
We show relationships from both a department and a supervisor to a physician, but only one is
necessary; we als
o assume that a physician must be associated with both a department and a
supervisor. This is an interesting E
-
R diagram since it contains two, disconnected parts. This is
possible, although rare in actual organizations. A suggested E
-
R diagram, including

attributes,
for this situation is presented below.





SUPERVISOR ID
DEPARTMENT ID
NAME
PHYSICIAN ID
DEPARTMENT
SUPERVISOR
Supervises
Supervises
Works
for
Works
in
PATIENT
PATIENT NO
ADDRESS
Given
Located
in
TREATMENT
ROOM
ADMIT DATE
DAILY RATE
PHONE
ROOM NO
COST
DESCRIPTION
TREATMENT ID
PHYSICIAN
Application

E
-
R Diagram

Chapter 12

Designing Da
tabases: Logical Data Modeling


157


5.

There are several foreign keys in these relations. OFFICER ID is a foreign key in OFFICE
referencing MEMBER ID from the MEMBER relation. OFFICE NAME is a foreign key in
EXPENSE referencing OFF
ICE NAME in the OFFICE relation. OFFICER IN CHARGE is a
foreign key in COMMITTEE referring to OFFICER ID or OFFICE NAME (which is not clear
from simply the relations) in the OFFICE relation. EXPENSE LEDGER NUMBER is a foreign
key in PAYMENT referencing L
EDGER NUMBER in the EXPENSE relation. MEMBER ID
in both RECEIPT and WORKERS cross
-
references MEMBER ID in the MEMBER relation.
COMMITTEE ID in WORKERS cross
-
references COMMITTEE ID in COMMITTEE.




See the accompanying E
-
R diagram. For simplicity, we do

not show attributes on this E
-
R
diagram. The WORKERS relation exists because of a many
-
to
-
many relationship, Works On,
between MEMBER and COMMITTEE.




It is inferred that a member sometimes has many receipts, but a receipt must have a member. An
expens
e sometimes has multiple payments, but each payment must have an expense. Each office
sometimes has multiple expenses, but each expense must have an office. Each office may have a
member as an officer
-
in
-
charge, and each member sometimes holds many offic
es. An office
sometimes is responsible for many committees, and each committee must have an office in
charge (although that office may not have a member assigned as officer). Committees
sometimes have many workers, and each worker sometimes works on many

committees. The E
-
R diagram is more expressive in that it displays explicitly the minimum cardinalities of
relationships and shows exactly which entities are related.









6.

Since there are four determinants among the functional dependencies, there

will be four
relations. The last functional dependency, the one with only a three
-
key composite determinant,
MEMBER
Incurs
Submits
RECEIPT
Holds
OFFICE
EXPENSE
In Charge
of
Reimburse
PAYMENT
COMMITTEE
Works
on
Fraternity

E
-
R Diagram


Modern Systems Analysis and Design, 3
rd

edition

Instructor’s Manual


158

signifies all the dates on which a particular applicant interviewed for a particular position. This
functional dependency does not signify a many
-
to
-
many relationship, like many composite keys
do, since date interviewed is itself not a determinant. It signifies an entity with a three
-
component composite key. The four 3NF relations are:


APPLICANT (
APPLICANT ID
, APPLICANT NAME, APPLICANT ADDRESS)

P
OSITION (
POSITION ID
, POSITION TITLE, DATE POSITION OPENS,
DEPARTMENT)

APPLICATION (
APPLICANT ID
,
POSITION ID
, DATE APPLIED)

INTERVIEW(
APPLICANT ID
,
POSITION ID
,
DATE INTERVIEWED
)





A suggested E
-
R diagram is provided below. For clarity, we show composi
te keys as a single
composite attribute.








7.

The objectives of a good coding scheme are to minimize storage space and to increase data
integrity. Student major is a classical example of a sparse field that can benefit from being
codified for stora
ge and data entry. For on
-
line data entry, you could provide a list of possible
majors from which the data entry person must choose; this legal list of codes will need to be
updated, but this can probably be done separately from data entry of student data
. A
fundamental coding choice is whether to use codes that are as dense as possible or to try to use
reasonably dense codes that have some meaning to most users (e.g., MIS versus 31 for a
Management Information Systems major). Short character string code
s (e.g., three alphabetic
characters) may take as little storage as a two digit numeric code, so a short character string may
achieve both objectives. An interesting way to approach this question is to have your students
identify how their university and
at least two other universities represent a student’s major.
Have your students compare and contrast these coding schemes. It is likely that they will locate
universities where the code is numeric, alphabetic, or alphanumeric.


APPLICANT
Applies
APPLICATION
Responds
to
POSITION
Interviews
for
INTERVIEW
APPLICANT ID
DATE APPLIED
APPLICANT ID + POSITION
POSITION ID
APPLICANT ADDRESS
APPLICANT NAME
APPLICANT ID + POSITION ID + DATE INTERVIEWED
POSITION TITLE
DEPARTMENT
DATE POSITION OPENS
Application

E
-
R Diagram

Chapter 12

Designing Da
tabases: Logical Data Modeling


159


8.

Suggestions for the
primary keys are underlined in the following relations. Attributes appearing
in italics serve as foreign keys.


VENDOR (
VENDOR NO
, ADDRESS)

PRICEQUOTE (
VENDOR NO
,
ITEM NO
,
QUOTE QUANTITY
, PRICE)

PART (
ITEM NO
, DESC)

PART RECEIPT (
ORDER NO
,
VENDOR NO, ITEM

NO, QUOTE QUANTITY
, DATE,
ORDER QUANTITY)


If VENDOR NO, ITEM NO, and ORDER NO are numeric values assigned without any
relationship to the associated entities, then these numbers would not change as the real world
changes, and they would be acceptable as
primary keys or components of composite primary
keys. QUOTE QUANTITY, on the other hand, is likely volatile, and is not suitable as part of
the primary key for the PRICEQUOTE table. We need to create a nonintelligent primary key
for the PRICEQUOTE table.

Also, we still may need individual secondary index keys on
VENDOR NO and ITEM NO attributes from this relation to facilitate joining the
PRICEQUOTE table with the VENDOR and PART tables, respectively.



9.

The guidelines for identifying keys for indexing

suggest that attributes used for selection,
sorting, grouping, and joining are potential candidates. We are asked to consider the three
queries as the only accesses to the database, so we do not implicitly need primary key indices.
We do not know the fr
equency of the three queries compared to update operations, so it is
impossible to make precise, optimal decisions on the most economical indices. Space does not
seem to be an issue since, as we will see, none of the qualifying fields are very long. So,
we
indicate all possible indices that might speed query processing. The first query, Query A,
appears to access the PART and PRICEQUOTE tables, but actually the PRICEQUOTE table is
sufficient. The E
-
R diagram indicates that every part has at least one ve
ndor, so every part has at
least one price quote. Thus, all parts appear at least once in the PRICEQUOTE table. The data
reported in Query A (ITEM NO, VENDOR NO, QUOTE QUANTITY, and QUOTE PRICE)
are found in the PRICEQUOTE table. The query asks for the
results to be sorted by ITEM NO
and by VENDOR NO (but not by quote quantity). A composite index on these two attributes is
the most efficient way to directly produce this sorted output, thus avoiding the need to do a sort
of the data once it is retrieved.

This assumes that the DBMS or file system utilizes a composite
index when it sees the compound sorting condition. Thus, for Query A, a secondary composite
index on first ITEM NO and then VENDOR NO is ideal.




The second query, Query B, asks to displa
y all the attributes from the PART RECEIPT and
PART tables; since ITEM NO is an attribute in the PART RECEIPT relation, we do not need
the PRICEQUOTE relation to link the PART RECEIPT and PART tables. No sorting is done,
but the query wants data for only
a specified part receipt date. Assuming that many days of part
receipt data are kept, then we would want to create a secondary key index on DATE and a
secondary key index on ITEM NO in the PART RECEIPT table and a primary key index on
ITEM NO in the PART
table to efficiently support the selection and joining needed for Query B.




The third query, Query C, involves selection of a particular vendor and display of attributes from
all associated PRICEQUOTE rows for that vendor; thus, all the data needed for

this query can
be found in the PRICEQUOTE table. No sorting is mentioned. Thus, for Query C, only a
secondary key index on VENDOR NO in the PRICEQUOTE table would support efficient
processing.



Modern Systems Analysis and Design, 3
rd

edition

Instructor’s Manual


160

10.

This question highlights the difficulty of setting use
ful default values. Our analysis also suggests
the potential difficulty of choosing a range check on the age attribute. The bulk of students at
most universities typically range in age from 18 to 23. Some students attend university before
age 18. For e
xample, some students graduate from high school at age 17 or opt to take a high
school equivalency examination and finish high school early (e.g., while younger than 18). In
addition, sometimes high school students attend college classes while still enrol
led in high
school. Finally, there is the occasional child prodigy who attends college at an earlier than usual
age. For the upper limit on the age range, there will obviously be some students older than 23,
particularly if there are undergraduate studen
ts who are working adults or who return to school
after other careers (such as military service or child rearing). Similarly, if there are graduate
students at this university then there will most certainly be students older than 23. In theory, a
student
’s age could reach three digits. If we could have only one default value, we could study
several years of student data and pick the modal age value. If we could have default values
conditional on student type (e.g., undergraduate, graduate, returning adu
lt), then we could use
the modal values for each student type. Thus, we would want values for student type to be
entered before student age. Alternatively, an impertinent student might respond to this question
by saying why even store age. Store the bir
thday and let the system calculate age when needed.


11.

One suggested answer is a situation where an employee needs to know a contact name for a
vendor, a quoted price for a particular item, and a description for the item. If the tables are
denormalized
, this query requires the joining of only two tables; however, if we are using
normalized relations, this query requires the joining of all three relations.



Guidelines for Using the Field Exercises



1.

Students can search through academic journals, text
books, or the Web. They are likely to find
information about other normal forms such as Boyce
-
Codd normal form (remove remaining
anomalies resulting from functional dependencies), fourth normal form (removing multivalued
dependencies), and fifth normal fo
rm (removing remaining anomalies). Students are likely to
encounter these other normal forms in a database class (often from a Computer Science
department) or on the job, particularly in a job where they are developing a relatively large,
complex database
. As with all normal forms, each rids the data model of potential redundancies
and inconsistencies.



2.

Normalized relations, a set of specifications specifying the format and structure of the data in
secondary storage, and an updated CASE repository are

several of the deliverables from file and
database design.



3.

Students should identify several types of information that are most useful for file and database
design. Information as it relates to primary keys, secondary keys, data types, business r
ules,
relationships, data integrity control methods, volume, present and future required data storage
space, and file organization should be collected.



4.

There are probably a number of DBMSs available at the student’s university. For example, the
stude
nt’s PC laboratory is likely to have at least one DBMS available; perhaps Microsoft Access
for Windows is available. A client
-
server or object
-
oriented DBMS might be available as well.
In addition, the university’s administrative database applications ar
e probably run on a
mainframe or minicomputer. A DBMS such as IBM’s DB2 or the Oracle DBMS are probably
being used. The data types supported by these DBMSs provide important criteria for
Chapter 12

Designing Da
tabases: Logical Data Modeling


161

determining what types of applications each of these DBMSs is best
suited. PC DBMSs tend to
have fewer physical file and database design options than do mini
-

or mainframe computer
-
based DBMSs. In addition, students should consider the number of records and physical file
organizations that the DBMS allows, the available

storage space on physical storage mediums,
the record locking and other security features that the DBMS provides, the availability of a
CASE tool, and other factors.



5.

Your students should find that many of the physical file and database design issues
presented in
the chapter are dealt with at their university. For instance, storage space, file organizations, and
data types are issues that must be addressed.



Guidelines for Using the Broadway Entertainment Company Cases


Guidelines and answers for usi
ng the Broadway Entertainment Company cases are available on this
textbook’s companion Web site. Please visit www.prenhall.com/hoffer to access this information.