DATA STRUCTURES-ANSx

clutteredreverandData Management

Oct 31, 2013 (3 years and 7 months ago)

142 views

5 MARKS

(1)Data Structure

In

computer science
, a

data structure

is a particular way of storing and organizing

data

in a computer so
that it can be used
efficiently
.
[1]
[2]

Different kinds of data structures are suited to different kinds of applications, and some are highly
specialized to s
pecific tasks. For example,

B
-
trees

are particularly well
-
suited for implementation of
databases, while

compiler

implementations usually use

hash tables
to look up identifiers.

Data structures provide a means to manage huge amounts of data efficiently, such as
large

databases

and

internet indexing services
. Usually, efficient data structures are a key to designing
efficient

algorithms
. Some formal design methods and

programming languages
emphasize data
structures, rather than algorithms, as the key

organizing factor in software design.

Overview



An

array data structure

stores a number of elements of the same type in a specific order. They are
accessed using an

integer to specify which element is required (although the elements may be of
almost any type). Arrays may be fixed
-
length or expandable.



Record

(also ca
lled tuple or struct) Records are among the simplest data structures. A record is a
value that contains other values, typically in fixed number and sequence and typically indexed by
names. The elements of records are usually called

fields

or

members
.



A

hash

or dictionary or

map

is a more flexible variation on a record, in which

name
-
value pairs

can be
added and deleted freely.



Union
. A union type defini
tion will specify which of a number of permitted primitive types may be
stored in its instances, e.g. "float or long integer". Contrast with a

record
, whi
ch could be defined to
contain a float

and

an integer; whereas, in a union, there is only one value at a time.



A

tagged union

(also called a

variant
, variant record, discriminated union, or disjoint union) contains
an additional field indicating its current type, for enhanced type safety.



A

set

is an

abstract data structure

that can store specific values, without any particular

order
, and no
repeated values. Values themselves are not retrieved from sets, rather one tests a value for
membership to obtain a boolean "in" or "not in".



An

object

contains a number of data fields, like a record, and also a number of program code
fragments for accessing or modifying them. Data structures not containing code, like those above,
are called

plain old data structure
.

Many others are possible, but they tend to be further variations and compounds of the above.

Basic principles

Data structures are generally based on
the ability of a computer to fetch and store data at any place in its
memory, specified by an address

a bit string that can be itself stored in memory and manipulated by the
program. Thus the

record

and

array

data structures are based on computing the addresses of data items
with

arithmetic operations
; while the

linked data structures

are based on storing addresses of data items
with
in the structure itself. Many data structures use both principles, sometimes combined in non
-
trivial
ways (as in

XOR linking
)

The implementation of a data structure usually r
equires writing a set of

procedures

that create and
manipulate instances of that structure. The efficiency of a data structure cannot be analyzed separately
from those operations. This

observation motivates the theoretical concept of an

abstract data type
, a data
structure that is defined indirectly by the operations that may be performed on it, and
the mathematical
properties of those operations (including their space and time cost).

Language support

Most

assembly languages

and some low
-
level languages, such as

BCPL

(Basic Combined Programming
Language), lack support for data structures. Many

h
igh
-
level programming languages

and some higher
-
level assembly languages, such as

MASM
, on the other hand, have special syntax or other built
-
in
support for certain data structures, such as vector
s (one
-
dimensional

arrays
) in the
C

language or multi
-
dimensional arrays in

Pascal
.

Most programming languages feature some sort of

library

mechanism that allows data structure
implementations to be reused by different programs. Modern languages usually come with standard
libraries that implement the most common data structures. Examples are the

C++

Standard Template
Library
, the

Java Collections Framework
, and

Microsoft
's

.NET Framework
.

Modern languages also generally
support

modular programming
, the separation between the

interface

of
a l
ibrary module and its implementation. Some provide

opaque data types

that allow clients to hide
implementation details.

Object
-
oriented programming languages
, such as

C++
,

Java

and

.NET
Framework

may use

classes

for this purpose.

Many known data structures have

concurrent

versions that allow multiple computing threads to access
the data structure simultane
ously.

(2)

programming language



A

programming language

is an

artificial language

designed to communicate

instructions

to a

machine
,
particularly a

computer
. Programming languages can be used to create

programs

that control the
behavior of a machine and/or to express

algorithms

precisely.

The earliest programming languages predate the

invention of the computer
, and were used to direc
t the
behavior of machines such as

Jacquard looms

and

player pianos
[
citation needed
]
. Thousands of different
programming languages have been created, mainly in the computer field, with many being created every
year. Most pro
gramming languages describe computation in an

imperative

style, i.e., as a sequence of
commands, although some languages, such as those that support
functional programming

or

logic
programming
, use alternative forms of description.

The

description of a programming language is usually split into the two components of

syntax

(form)
and

semantics

(meaning). Some languages are defined by a specification document (for example,
the

C

programming language is specified

by an

ISO

Standard), while other languages, such as

Perl

5 and
earlier, have a dominant

implementation

that is used as a

reference
.

Definitions

A programming language is a notation for writing

programs
, which are specifications of a computation
or

algorithm
.
[1]

Some, but not all, authors restrict the term "programming language" to those languages
t
hat can express

all

possible algorithms.
[1]
[2]

Traits often considered important

for what constitutes a
programming language include:



Function and target:

A

computer programming language

is a language
[3]

used to write

computer
programs
, which involve a

computer

performing some kind of computation
[4]

or

algorithm

and possibly
control external devices such as

printers
,

disk drives
,

robots
,
[5]

and so on. For
example

PostScript

programs are frequently created by another program to cont
rol a computer
printer or display. More generally, a programming language may describe computation on some,
possibly abstract, machine. It is generally accepted that a complete specification for a programming
language includes a description, possibly ideal
ized, of a machine or processor for that language.
[6]

In
most practical contexts, a programming language involves a computer; consequently, programming
languages are u
sually defined and studied this way.
[7]

Programming languages differ from

natural
languag
es

in that natural languages are only used for interaction between people, while programming
languages also allow humans to communicate instructions to machines.



Abstractions:

Programming languages usually contain

abstractions

for defining and manipulating

data
structures

or controlling the

flow of execution
. The practical necessity that a programming language
support adequate abstractions is expressed by the

abstraction principle
;
[8]

this principle is sometimes
formulated as recommendation to the programmer to make pro
per use of such abstractions.
[9]



Expressive power:

The

theory of computation

cl
assifies languages by the computations they are
capable of expressing. All

Turing complete

languages can implement the same set
of
algorithms
.

ANSI/ISO SQL

and

Charity

are examples of

languages that are not Turing complete,
yet often called programming languages.
[10]
[11]

Markup languages

like

XML
,

HTML

or

troff
, which define

structured data
, are not generally considered
programming languages.
[12]
[13]
[14]

Programming languages may,
however, share the syntax with markup
languages if a computational semantics is defined.

XSLT
, for example, is a

Turing complete

XML
dialect.
[15]
[16]
[17]

Moreover,

LaTeX
, which is mostly used for structuring documents, also contains a Turing
complete subset.
[18]
[19]

The term

computer language

is sometimes used interchangeably with programming
language.
[20]

However, the usage of both terms varies among authors, including the exact scope of each.
One usage describes programming languages as a subset of computer languages.
[21]

In this vein,
languages used in computing that have a different goal than expressing computer programs are
generically designated computer languages. For instance, markup

languages are sometimes referred to
as computer languages to emphasize that they are not meant to be used for programming.
[22]

Another
usage regards programming languages
as theoretical constructs for programming abstract machines, and
computer languages as the subset thereof that runs on physical computers, which have finite hardware
resources.
[23]

John C. Reynolds

emphasizes that

formal specification

languages are jus
t as much
programming languages as are the languages intended for execution. He also argues that textual and
even graphical input formats that affect the behavior of a computer are programming languages, despite
the fact they are commonly not Turing
-
comple
te, and remarks that ignorance of programming language
concepts is the reason for many flaws in input formats.
[24]

[
edit
]
Elements

All programming languages have some

primitive

building blocks for the description of data and the
processes or transformations applied to them (like the addition of two numbers or the selection of an item
from a collection). These primitives are defined by syntactic and semantic rul
es which describe their
structure and meaning respectively.

[
Syntax


A programming language's surface form is known as its

syntax
. Most
programming languages are purely
textual; they use sequences of text including words, numbers, and punctuation, much like written natural
languages. On the other hand, there are some programming languages which are more

graphical

in
nature, using visual relationships between symbols to specify a program.

The syntax of a language describes the possible combinations of symbols that form a syntactically co
rrect
program. The meaning given to a combination of symbols is handled by semantics (either

formal

or hard
-
coded in a

reference implementation
). Since most languages are textual, this article discusses textual
syntax.

Programming language syntax is usua
lly defined using a combination of

regular
expressions

(for

lexical

structure) and
Backus

Naur Form

(for

grammatical

structure). Below is a simple
grammar, b
ased on

Lisp
:

This grammar specifies the following:



an

expression

is either an

atom

or a

list
;



an

atom

is either a

number

or a

symbol
;



a

number

is an
unbroken sequence of one or more decimal digits, optionally preceded by a plus or
minus sign;



a

symbol

is a letter followed by zero or more of any characters (excluding whitespace); and



a

list

is a matched pair of parentheses, with zero or more

expressions

inside it.

The following are examples of well
-
formed token sequences in this grammar: '
12345
', '
()
', '
(a b c232
(1))
'

Not all syntactically correct programs are semantically correct. Many syntactically correct programs are
nonetheless ill
-
formed, per the
language's rules; and may (depending on the language specification and
the soundness of the implementation) result in an error on translation or execution. In some cases, such
programs may exhibit

undefined behavior
. Even when a program is well
-
defined within a language, it may
still have a meaning that is not intended by the person who wrote it.

Using

natural language

as an example, it may not be possible to assign a meaning to a grammatically
correct sentence or the sentence may be false:



"
Colorless green ideas sleep furiously
." is grammatically well
-
formed but has no generally accepted
meaning.



"John is a married
bachelor." is grammatically well
-
formed but expresses a meaning that cannot be
true.

The following C language fragment is syntactically correct, but performs operations that are not
semantically defined (the operation

*p >> 4

has no meaning for a value hav
ing a complex type and

p
-
>im

is not defined because the value of

p

is the

null pointer
):

complex
*
p
=

NULL
;

complex abs_p
=

sqrt
(
*
p
>>

4

+

p
-
>
im
)
;

If the

type declaration

on the first line were omitted, the program would trigger an error on compilation, as
the variable "p" would not be defined. But the program woul
d still be syntactically correct, since type
declarations provide only semantic information.

The grammar needed to specify a programming language can be classified by its position in the

Chomsky
hierarchy
. The syntax of most programming languages can be specified using a Type
-
2 grammar, i.e.,
they are

context
-
free grammars
.
[25]

Some languages, including Perl and Lisp, contain constructs that
allow execution during the parsing phase. La
nguages that have constructs that allow the programmer to
alter the behavior of the parser make syntax analysis an

undecidable problem
, and generally blur the
distinc
tion between parsing and execution.
[26]

In contrast to
Lisp's macro system

and Perl's

BEGIN

blocks,
w
hich may contain general computations, C macros are merely string replacements, and do not require
code execution.
[27]

(3)

Array data structure



In

computer science
, an

array data structure

or simply an

array

is a

data structure

consisting of

a
collection of

elements

(
values

or

variables
), each
identified by at least one

array index

or

key
. An array is
stored so that the position of each element can be computed from its index

tuple

by a mathematical
formula.
[1]
[2]
[3]

For example
, an array of 10 integer variables, with indices 0 through 9, may be stored as 10

words

at
memory addresses 2000, 2004, 2008, … 2036, so that the element with index

i

has
the address 2000 + 4
×

i
.
[4]

Because the mathematical concept of a

matrix

can be
represented as a two dimensional grid, two
dimensional arrays are also sometimes called matrices. In some cases the term "vector" is used in
computing to refer to an array, although

tuples

rathe
r than

vectors

are more correctly the mathematical
equivalent. Arrays are often used to implement

tables
, especially

lookup tables
; the word

table

is
sometimes used as a synonym of

array
.

Arrays are among the oldest and most important data structures, and are used by
almost every program.
They are also used to implement many other data structures, such as

lists

and

strings
. They effectively
exploit the addressing logic of computers. In most modern computers and many

external storage

devices,
the memory is

a one
-
dimensional array of words, whose indices are their addresses.

Processors
,
especially

vector processors
, are often optimized for array operations.

Arrays are useful mostly because the element indices can be computed at

run time
. Among other things,
this feature allows a single iterative

statement

to process arbitrarily many elements of an array. For that
reas
on, the elements of an array data structure are required to have the same size and should use the
same data representation. The set of valid index tuples and the addresses of the elements (and hence
the element addressing formula) are usually,
[3]
[5]

but not always,
[2]

fixed while the array is in use.

The term

array

is often used to mean

array data t
ype
, a kind of

data type

provided by most

high
-
level
programming languag
es

that consists of a collection of values or variables that can be selected by one or
more indices computed at run
-
time. Array types are often implemented by array structures; however, in
some languages they may be implemented by

hash tables
,

linked lists
,
search trees
, or other data
structures.

The term is also used, especially in the description of

algorithms
, to mean

associative
array

or "abstract
array", a

theoretical computer science

model (an

abstract data type

or ADT) intended to capture the
essential properties of arrays.

History

The first digital computers used machine
-
language programming to set up and access array structures for
data tables, vector and matrix computations, and for
many other purposes.

Von Neumann

wrote the first
array
-
sorting program (
merge sort
) in 1945, during
the building of the

first stored
-
program
computer
.
[6]
p.

159

Array indexing was originally done by

self
-
modifying code
, and later using

index
registers

and

indirect addressing
. Some mainframes designed in the 1960s, such as the

Burroughs
B5000

and its successors,

had special instructions for array indexing that included index
-
bounds
checking.
[
citation needed
]
.

Assembly languages generally have no special support
for arrays, other than what the machine itself
provides. The earliest high
-
level programming languages, including

FORTRAN

(1957),

COBOL
(1960),
and

ALGOL 60

(1960), had support for multi
-
dimensional arrays, and so has

C

(19
72). In

C++

(1983),
class templates exist for multi
-
dimensional arrays whose dimension is fixed at runtime
[
3]
[5]

as well as for
runtime
-
flexible arrays.
[2]

Applications

Arrays
are used to implement mathematical

vectors

and

matrices
, as well as other kind
s of rectangular
tables. Many

databases
, small and large, consist of (or include) one
-
dimensional arrays whose elements
are

records
.

Arrays are used to implement other data structures, such as

heaps
,

hash
tables
,

deques
,

queues
,

stacks
,

strings
, and

VLists
.

One or more large arrays are sometimes used to emulate in
-
program

dynamic memory allocation
,
p
articularly

memory pool

allocation. Historically, this has sometimes been the only way to allocate
"dynamic memory" portably.

Arrays can be used to determine partial or complete

control flow

in programs, as a compact alternative to
(otherwise repetitive), multiple

IF

statements. They are known in this context as

control tables

and are
used in conjunction with a purpose built interpreter whose

control flow

is altere
d according to values
contained in the array. The array may contain

subroutine

pointers

(or relative subroutine numbers that
can be acted upon by

SWITCH

statements)
-

that direct the path of the execution.

[
edit
]
Array element identifier and addressing formulas

When data objects are stored in an array, individual objects are
selected by an index that is usually a
non
-
negative

scalar

integer
. Indices are also called subscripts
. An index

maps

the array value to a stored
object.

There are three ways in which the elements of an array can be indexed:



0

(
zero
-
based indexing
): The first
element of the array is indexed by subscript of 0.
[7]



1

(
one
-
based indexing
): The first element of the array is indexed by subscript of 1.
[8]



n

(
n
-
based indexing
): The base index of an array can be freely chosen. Usually programming
languages allowing

n
-
based indexing

also allow negative index values and other

scalar

data types
like

enumerations
, or

characters

may be used as an array index.

Arrays can have multiple dimensions, thus it is not uncommon to access an array using multiple indices.
For example a two
dimensional array

A

with three rows and four columns might provide access to the
element at the 2nd row and 4th column by the expression:

A[1, 3]

(in a

row major

language) and

A[3,
1]

(i
n a

column major

language) in the case of a zero
-
based indexing system. Thus two indices are used
for a two dimensional array, three for a three dimensional array, and n for an n d
imensional array.

The number of indices needed to specify an element is called the dimension, dimensionality, or

rank

of
the array.

In standard arrays
, each index is restricted to a certain range of consecutive integers (or consecutive
values of some

enumerated type
), and the address of an element is computed by a "linear"

formula on
the indices.

[
edit
]
One
-
dimensional arrays

A one
-
dimensional array (or single dimension array) is a
type of linear array. Accessing its elements
involves a single subscript which can either represent a row or column index.

As an example consider the C declaration

int anArrayName[10];

Syntax

: datatype anArrayname[sizeofArray];

In the given example the
array can contain 10 elements of any value available to the

int

type. In C, the
array element indices are 0
-
9 inclusive in this case. For example, the expressions
anArrayName[0]
,
and

anArrayName[9]

are the first and last elements respectively.

For a vector
with linear addressing, the element with index

i

is located at the address

B

+

c



i
, where

B

is
a fixed

base address

and

c

a fixed constant, sometimes called the

address increment

or
stride
.

If the valid element indices begin at 0, the constant

B

is simply

the address of the first element of the
array. For this reason, the

C programming language

specifies that array indices always begin at 0; and
many program
mers will call that element "
zeroth
" rather than "first".

However, one can choose the index of the first element by an appropriate choice of the base address

B
.
For

example, if the array has five elements, indexed 1 through 5, and the base address

B
is replaced
by

B

− 30
c
, then the indices of those same elements will be 31 to 35. If the numbering does not start at 0,
the constant

B

may not be the address of any elemen
t.

[
edit
]
Multidimensional arrays

For a two
-
dimensional array, the element with indices

i
,
j

would have address

B

+

c



i

+

d



j
, where the
coefficients

c

and

d

are the

row

and

column address increments
, respectively.

More generally, in a

k
-
dimensional array, the address of an element with indices

i
1
,

i
2
, …,

i
k

is

B

+

c
1



i
1

+

c
2



i
2

+ … +

c
k



i
k
.

For example: int

a[3][2];

This means that array a has 3 rows and 2 columns, and the array is of integer type. Here we can
store 6 elements they are stored linearly but starting from first row linear then continuing with second
row. The above array will be stored as a
11
, a
12
, a
13
, a
21
, a
22
, a
23
.

This formula requires only

k

multiplications and

k
−1 additions, for any array that can fit in memory.
Moreover, if any coefficient is a fixed power of 2, the multiplication can be replaced by

bit shifting
.

The coefficients

c
k

must be chosen so that every valid index tuple maps to the address of a distinct
element.

If the minimum legal value for every index is 0,

then

B

is the address of the element whose indices
are all zero. As in the one
-
dimensional case, the element indices may be changed by changing the
base address

B
. Thus, if a two
-
dimensional array has rows and columns indexed from 1 to 10 and 1
to 20, res
pectively, then replacing

B

by

B

+

c
1

-

− 3

c
1

will cause them to be renumbered from 0
through 9 and 4 through 23, respectively. Taking advantage of this feature, some languages (like
FORTRAN 77) specify that array indices begin at 1, as in mathematical tr
adition; while other
languages (like Fortran 90, Pascal and Algol) let the user choose the minimum value for each index.

20 marks


(1)Data Base

A

database

is a structured collection of data. The data are typically organized to model relevant aspects
of
reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this
information (for example, finding a hotel with vacancies).

The term

database

is correctly applied to the data and their supporting

data structures
, and not to
the

database management system

(DBMS). The database data co
llection with DBMS is called a
database
system
.

The term

database system

implies that the data are managed to some level of quality (measured in terms
of

accuracy
, availability, usability, and resilience) and this in turn often implies the use of a general
-
purpose database management system (DBMS).
[1]

A general
-
purpose DBMS is typically a
complex

software

system that meets many usage requirements to
properly maintain its databases which
are often large and complex.

This is specially the case with client
-
server, near
-
real time transactional systems, in which multiple users
have access to data, data is concurrently entered and inquired for in ways that
preclude single
-
thread
batch processing. Most of the complexity of those requirements are still present with personal, desktop
-
based database systems.

Well known DBMSs include

FoxPro
,

IBM DB2
,

Linter
,

Microsoft Acc
ess
,

Microsoft SQL
Server
,

MySQL
,

Oracle
,

PostgreSQL

and

SQLite
. A database is not generally

portable

across different
DBMS, but different DBMSs can

inter
-
operate

to some degree b
y
using

standards

like

SQL

and

ODBC

together to su
pport a single application built over more than one
database. A DBMS also needs to provide effective

run
-
time

execution to properly supp
ort (e.g., in terms
of

performance
,

availability
, and

security
) as many database

end
-
users

as needed.

A way to classify databases involves the type of their contents, for example:

bibliographic
, document
-
text,
statistical, or multimedia objects. Another way is by their application area, for example: accounting, music
compositions, movies, b
anking, manufacturing, or insurance.

The term

database

may be narrowed to specify particular aspects of organized collection of data and may
refer to the logical database, to the physical database as data content in

computer data storage

or to
many other database sub
-
definitions.

History

Database concept

The database concept has evolved since the 1960s to ease increasing difficulties in designing, building,
and mai
ntaining complex

information systems

(typically with many concurrent end
-
users, and with a large
amount of diverse data). It has evolved together with

database management systems

which enable the
effective handling of databases. Though the terms database and DBMS define different entities, they are
ins
eparable: a database's properties are determined by its supporting DBMS. The

Oxford English
dictionary

cites
[
citation needed
]

a 1962 technical report as the first to use the term "data
-
base." With the
progress in technology in the areas of

processors
,

computer memory
,

computer storage

and

computer
networks
, the sizes, capabilities, and performance of databases and their respective DBMSs have grown
in orders of magnitudes. For decades it has been unlikely tha
t a complex information system can be built
effectively without a proper database supported by a DBMS. The utilization of databases is now spread to
such a wide degree that virtually every technology and product relies on databases and DBMSs for its
develo
pment and commercialization, or even may have such embedded in it. Also, organizations and
companies, from small to large, heavily depend on databases for their operations.

No widely accepted exact definition exists for DBMS. However, a system needs to pro
vide considerable
functionality to qualify as a DBMS. Accordingly its supported data collection needs to meet respective
usability requirements (broadly defined by

the requirements below
) to qualify as a database. Thus, a
database and its supporting DBMS are defined here by a set of general requirements listed below.
Virtually all existing mature DBMS products meet these requirements to a great extent,

while less mature
either meet them or converge to meet them.

[
edit
]
Evolution of database and DBMS technol
ogy

See also

Database management system#History

The introduction of the term

database

coincided with the availability of direct
-
access
storage (disks
and drums) from the mid
-
1960s onwards. The term represented a contrast with the tape
-
based
systems of the past, allowing shared interactive use rather than daily batch processing.

In the earliest database systems, efficiency was perhaps the
primary concern, but it was already
recognized that there were other important objectives. One of the key aims was to make the data
independent of the logic of application programs, so that the same data could be made available to
different applications.

I
n the period since the 1970s database technology has kept pace with the increasing resources
becoming available from the computing platform: notably the rapid increase in affordable capacity
and speed of disk storage, and of main memory. This has enabled e
ver larger databases and higher
throughput to be achieved.

The first generation of general
-
purpose database systems were

navigational
,
[2]

applications typically
accessed data by following pointers from one record to another. The two main data models at this
time were the

hierarchical model
, epitomized by IBM's IMS system, and the

Codasyl

model (
Network
model
), implemented in a number of products such as

IDMS
.

The

relational model
, first proposed in 1970
by

Edgar F. Codd
, departed from this tradition by
insisting that applications should search for data by content, rather than by following links. This was
considered necessary to
allow the content of the database to evolve without constant rewriting of
links and pointers. The relational model is made up of ledger
-
style tables, each used for a different
type of entity. Data may be freely inserted, deleted and edited in these tables,

with the DBMS
(DataBase Management System) doing whatever maintenance needed to present a table view to the
application/user. The relational part comes from entities referencing other entities in what is known as
one
-
to
-
many relationship, like a tradition
al hierarchical model, and many
-
to
-
many relationship, like a
navigational (network) model. Thus, a relational model can express both hierarchical and navigational
models, as well as its native tabular model, allowing for pure or combined modeling in terms
of these
three models, as the application requires.

The earlier expressions of the relational model did not make relationships between different entities
explicit in the way practitioners were used to back then, but as

primary keys

and

foreign keys
. These
keys, though, can be also seen as pointers in their own right, stored in tabular form. This use of keys
rather than pointers conceptually obscured relations between entities, at least the way it was
presented back then. Thus, the wisdom at the time was tha
t the relational model emphasizes search
rather than navigation, and that it was a good conceptual basis for a query language, but less well
suited as a navigational language. As a result, another data model, the

entity
-
relationship
model

which emerged shortly later (1976), gained popularity for

database design
, as it emphasiz
ed a
more familiar description than the earlier relational model. Later on, entity
-
relationship constructs
were retrofitted as a data modeling construct for the relational model, and the difference between the
two have become irrelevant.

Earlier relational

system implementations lacked the sophisticated automated optimizations of
conceptual elements and operations versus their physical storage and processing counterparts,
present in modern DBMSs (DataBase Management Systems), so their simplistic and literal

implementations placed heavy demands on the limited processing resources at the time. It was not
until the mid 1980s that computing hardware became powerful enough to allow relational systems
(DBMSs plus applications) to be widely deployed. By the early 1
990s, however, relational systems
were dominant for all large
-
scale data processing applications, and they remain dominant today
(2012) except in niche areas. The dominant database language is the standard SQL for the
Relational model, which has influenced

database languages for other data models.

The rigidity of the relational model, in which all data are held in related tables with a fixed structure of
rows and columns, has increasingly been seen as a limitation when handling information that is richer
or

more varied in structure than the traditional 'ledger
-
book' data of corporate information systems.
These limitations come to play when modeling document databases, engineering databases,
multimedia databases, or databases used in the molecular sciences.

M
ost of that rigidity, though, is due to the need to represent new data types other than text and text
-
alikes within a relational model. Examples of unsupported data types are:



graphics (and operations such as pattern
-
matching and

OCR
)



Multidimensional constructs such as 2D (geographical), 3D (geometrical), and multidimensional
hypercube models (data analysis).



XML (an hierarchical data modeling tech
nology evolved from EDS and HTML), used for data
interchange among dissimilar systems.

More fundamental conceptual limitations came with Object Oriented methodologies, with their
emphasis on encapsulating data and processes (methods), as well as expressing

constructs such as
events or triggers. Traditional data modeling constructs emphasize the total separation of data from
processes, though modern DBMS do allow for some limited modeling in terms of validation rules and
stored procedures.

Various attempts h
ave been made to address this problem, many of them banners such as

post
-
relational

or

NoSQL
. Two developments of note are the

object database

and the

XML database
. The
vendors of relational databases have fought off competitio
n from these newer models by extending
the capabilities of their own products to support a wider variety of data types.

[
edit
]
G
eneral
-
purpose DBMS

A DBMS has evolved into a complex software system and its development typically requires
thousands of person
-
years of development effort.
[
citation needed
]

Some general
-
purpose DBMSs, like
Oracle, Microsoft SQL Server, FoxPro, and IBM DB2, have been undergoing upgrades for thirty
years or more.

General
-
purpose DBMSs aim to satisfy as many applications as possible, which
typically makes them even more complex than special
-
purpose databases. However, the fact that
they can be used "off the shelf", as well as their amortized cost over many applicat
ions and
instances, makes them an attractive alternative (Vs. one
-
time development) whenever they meet an
application's requirements.

Though attractive in many cases, a general
-
purpose DBMS is not always the optimal solution: When
certain applications are
pervasive with many operating instances, each with many users, a general
-
purpose DBMS may introduce unnecessary overhead and too large "footprint" (too large amount of
unnecessary, unutilized software code). Such applications usually justify dedicated deve
lopment.
Typical examples are

email

systems, though they need to possess certain DBMS properties: email
systems are built in a way that optimizes email messages handling and managing, and do not

need
significant portions of a general
-
purpose DBMS functionality.

[
edit
]
Types of people involved

Three types of people
are involved with a general
-
purpose DBMS:

1.

DBMS developers

-

These are the people that design and build the DBMS product, and the
only ones who touch its code. They are typically the employees of a DBMS vendor
(e.g.,
Oracle
,

IBM
,

Microsoft
,

Sybase
), or, in the case of

Open source

DBMSs (e.g., MySQL),
volunteers or people supported by interested companies and organizations. T
hey are
typically skilled

systems programmers
. DBMS development is a complicated task, and some
of the popular DBMSs have been under development and enhancement (also

to follow
progress in technology) for decades.

2.

Application developers

and

database administrators

-

These are the people that design
and build a database
-
based application that uses the DBMS. The latter group members
design the needed d
atabase and maintain it. The first group members write the needed
application programs which the application comprises. Both are well familiar with the DBMS
product and use its user interfaces (as well as usually other tools) for their work. Sometimes
the
application itself is packaged and sold as a separate product, which may include the
DBMS inside (see
embedded database
; subject to proper DBMS licensing), or sold
separat
ely as an add
-
on to the DBMS.

3.

Application's end
-
users

(e.g., accountants, insurance people, medical doctors, etc.)
-

These people know the application and its end
-
user interfaces, but need not know nor
understand the underlying DBMS. Thus, though they are
the intended and main
beneficiaries of a DBMS, they are only indirectly involved with it.

Database machines and appliances

Main article:

Database machine

In the 1970s and
1980s attempts were made to build database systems with integrated hardware and
software. The underlying philosophy was that such integration would provide higher performance at
lower cost. Examples were IBM

System/38
, the early offering of

Teradata
, and the

Britton Lee,
Inc.

database ma
chine. Another approach to hardware support for database management
was

ICL
's

CAFS

accelerator, a hardware disk controller with programmable search capabilities. In the
long term these efforts were generally unsuccessful because specialized database machines could
not keep pace with th
e rapid development and progress of general
-
purpose computers. Thus most
database systems nowadays are software systems running on general
-
purpose hardware, using
general
-
purpose computer data storage. However this idea is still pursued for certain applica
tions by
some companies like

Netezza

and

Oracle

(
Exadata
).

[
edit
]
Database research

Database research has been an active and diverse area, with many specializations, carried out since
the ear
ly days of dealing with the database concept in the 1960s. It has strong ties with database
technology and DBMS products. Database research has taken place at research and development
groups of companies (e.g., notably at

IBM Research
, who contributed technologies and ideas virtually
to any DBMS existing today),

research institutes
, and

academia
. Research has been done both
through

theory

and

prototypes
. The interaction between research and database related product
development has been very productive to the database area, and many related key concepts and
technologies emerged from it. Notable are the Relational and the Entit
y
-
relationship

models
,
the

atomic transaction

concept and related

Concurrency control

techniques, Query languages
and

Query optimization

methods,

RAID
, and more. Research has provided deep
insight

to virtually all
aspects of databases, though not always has been pragmatic, effective (and cannot and s
hould not
always be: research is exploratory in nature, and not always leads to accepted or useful ideas).
Ultimately market forces and real needs determine the selection of problem solutions and related
technologies, also among those proposed by research.

However, occasionally, not the best and
most

elegant

solution wins (e.g., SQL). Along their history DBMSs and respective databases, to a
great extent, have been the outcome of such resear
ch, while real product requirements and
challenges triggered database research directions and sub
-
areas.

The database research area has several notable dedicated

academic j
ournals

(e.g.,

ACM
Transactions on Database Systems
-
TODS,

Data and Knowledge Engineering
-
DKE, and more) and
annual
conferences

(e.g.,

ACM

SIGMOD
, ACM

PODS
,

VLDB
,

IEEE

ICDE, and more),

as well as an
active and quite heterogeneous (subject
-
wise) research community all over the world.

(2)
D
ata model



A high
-
level

data model

in

business

or for any functional area is an

abstract model

that documents and
organizes the business data for communication between functional and technical people. It is used to
show the data needed and created by busines
s processes.

A

data model

in

software engineering

is an

abstract model

that document
s and organizes the business
data for communication between team members and is used as a plan for developing applications,
specifically how data are stored and accessed.

According to Hoberman (2009), "A data model is a

wayfinding

tool for both business and IT professionals,
which uses a set of symbols and text to precisely explain a subset of real information to improve
communication within the
organization and thereby lead to a more flexible and stable application
environment."
[2]

A data model explicitly determines the structure of data or

structured data
. Typical appl
ications of data
models include
database models
, design of

information systems
, and enabl
ing exchange of data. Usually
data models are specified in a

data modeling

language.
[3]

Communication

and

precision

are the two key benefits that make a data model impor
tant to applications
that use and exchange data. A data model is the medium which project team members from different
backgrounds and with different levels of experience can communicate with one another. Precision means
that the terms and rules on a data m
odel can be interpreted only one way and are not ambiguous.
[2]

A data model can be sometimes referred to as a

data structure
, especially in the context of

programming
languages
. Data models are of
ten complemented by

function models
, especially in the context
of

enterprise models
.

[
edit
]
Overview

Managing large quantities of structured and

unstructured data

is a primary function of

information
systems
. Data models describe structured

data

for

storage

in data management systems such as
relational databases. They typically do not describe unstructured data, such as

word
processing

documents,

email messages
, pictures, digital audio, and video.

The role of data models

The main aim of data models is to support the develo
pment of

information systems

by providing the
definition and format of data. According to West and Fowler (1999) "if this is done consistently across
systems then
compatibility of data can be achieved. If the same data structures are used to store and
access data then different applications can share data. The results of this are indicated above. However,
systems and interfaces often cost more than they should, to b
uild, operate, and maintain. They may also
constrain the business rather than support it. A major cause is that the quality of the data models
implemented in systems and interfaces is poor".
[4]



"Business rules, specific to how things are done in a particular place, are often fixed in the structure
of a data model. This means that small changes in the way business is conducted lead to large
changes in computer systems and interface
s".
[4]



"Entity types are often not identified, or incorrectly identified. This can lead to replication of data, data
structure, and functionality, together with the attendant cos
ts of that duplication in development and
maintenance".
[4]



"Data models for different systems are arbitrarily different. The result of this is that complex interfaces
are require
d between systems that share data. These interfaces can account for between 25
-
70% of
the cost of current systems".
[4]



"Data cannot be shared electronically with customers and su
ppliers, because the structure and
meaning of data has not been standardised. For example, engineering design data and drawings for
process plant are still sometimes exchanged on paper".
[4]

The reason for these problems is a lack of standards that will ensure that data models will both meet
business needs and be consistent.
[4]

According to Hoberman (2009
), "A data model is a wayfinding tool
for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset
of real information to improve communication within the organization and thereby lead to a more flexible
and st
able application environment."[2]

A data model explicitly determines the structure of data or structured data. Typical applications of data
models include database models, design of information systems, and enabling exchange of data. Usually
data models ar
e specified in a data modeling language.[3]

Communication and precision are the two key benefits that make a data model important to applications
that use and exchange data. A data model is the medium which project team members from different
backgrounds a
nd with different levels of experience can communicate with one another. Precision means
that the terms and rules on a data model can be interpreted only one way and are not ambiguous.[2]

A data model can be sometimes referred to as a data structure, espec
ially in the context of programming
languages. Data models are often complemented by function models, especially in the context of
enterprise models.

[
edit
]
Three perspectives

A data model

instance

may be one of three kinds according to

ANSI

in 1975:
[5]



Conceptual schema

: describes the semantics of a domain, being the scope of the model. For
example, it may be a model of the interest area of an organization or
industry. This consists of entity
classes, representing kinds of things of significance in the domain, and relationships assertions about
associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or
propositions that ca
n be expressed using the model. In that sense, it defines the allowed expressions
in an artificial 'language' with a scope that is limited by the scope of the model. The use of conceptual
schema has evolved to become a powerful communication tool with busi
ness users. Often called a
subject area model (SAM) or high
-
level data model (HDM), this model is used to communicate core
data concepts, rules, and definitions to a business user as part of an overall application development
or enterprise initiative. The
number of objects should be very small and focused on key concepts. Try
to limit this model to one page, although for extremely large organizations or complex projects, the
model might span two or more pages.
[6]



Logical schema

: describes the semantics, as represented by a particular data manipulation
technology. This consists of descriptions of tables

and columns, object oriented classes, and XML
tags, among other things.



Physical schema

: describes the physical means by which data are stored. This is concerned with
parti
tions, CPUs, tablespaces, and the like.

The significance of this approach, according to ANSI, is that it allows the three perspectives to be
relatively independent of each other. Storage technology can change without affecting either the logical or
the con
ceptual model. The table/column structure can change without (necessarily) affecting the
conceptual model. In each case, of course, the structures must remain consistent with the other model.
The table/column structure may be different from a direct transl
ation of the entity classes and attributes,
but it must ultimately carry out the objectives of the conceptual entity class structure. Early phases of
many software development projects emphasize the design of a

conceptual data model
. Such a design
can be detailed into a

logical data model
. In later stages, this model may be translated i
nto

physical data
model
. However, it is also possible to implement a conceptual model directly.

[
edit
]
History

One of the earliest pioneering works in modelling information systems was done by Young and Kent
(1958),
[7]
[8]

who argued for "a precise and abstract way of specifying the informational and time
characteristics of a

data processing

problem". They wanted to create "a notation that should enable
the

analyst

to organize the
problem around any piece of

hardware
". Their work was a first effort to create
an abstract specification and invariant basis for designing different alternative implementations using
different hardware components. A next step in IS modelling was taken by

CODASYL
, an IT industry
consortium formed in 1959, who essentially aimed at the same thing as Young and Kent: the
development of "a proper structure for machine independent problem definition language, at the system
level of da
ta processing". This led to the development of a specific IS

information algebra
.
[8]

In

the 1960s data modeling gained more significance with the initiation of the

management information
system

(MIS) concept. According to Leondes (20
02), "during that time, the information system provided
the data and information for management purposes. The first generation

database system
,
called

Integrated Data Store

(IDS), was designed by

Charles Bachman

at General Electric. Two famous
database m
odels, the

network data model

and the

hierarchical data model
, were pr
oposed during this
period of time".
[9]

Towards the end of the 1960s

Edgar F. Codd

worked out his theories

of data
arrangement, and proposed the

relational model

for database management based on

first
-
order predicate
logic
.
[10]

In the 1970s

entity relationship modeling

em
erged as a new type of conceptual data modeling, originally
proposed in 1976 by

Peter Chen
. Entity relationship models were being used in the first stage
of

information system

design during the

requirements analysis

to describe information needs or the type
of

information

that is to be stored in a

database
. This technique can describe any
ontology
, i.e., an
overview and classification of concepts and their relationships, for a certain

area of interest
.

In the 1970s

G.M. Nijssen

developed "Natural Language Information Analysis Method" (NIAM) method,
and developed this in th
e 1980s in cooperation with

Terry Halpin

into

Object
-
Role Modeling
(ORM).

Further in the
1980s according to Jan L. Harrington (2000) "the development of the

object
-
oriented

paradigm brought about a fundamental change in the way we look at data and the procedures
that operate on data. Traditionally, data and procedures have been stored separately: the data and their
relationship in a database, the procedures in an application program. Object orientation, however,
combined an entity's procedure with its data."
[11]

[
edit
]
Types of data models

[
edit
]
Database model

Main article:

Database model

A
database model is a specification describing how a database is structured and used. Several such
models have been suggested. Common models include:




Flat model
: This
may not strictly qualify as a data model. The flat (or table) model consists of a single,
two
-
dimensional array of data elements, where all members of a given column are assumed to be
similar values, and all members of a row are assumed to be related to on
e another.



Hierarchical model
: In this model data is organized into a tree
-
like structure, implying a single upward
link in each record to describe the nesting, and a s
ort field to keep the records in a particular order in
each same
-
level list.



Network model
: This model organizes data using two fundamental constructs, called records and
sets. R
ecords contain fields, and sets define one
-
to
-
many relationships between records: one owner,
many members.



Relational model
: is a database model based on first
-
order predic
ate logic. Its core idea is to describe
a database as a collection of predicates over a finite set of predicate variables, describing constraints
on the possible values and combinations of values.




Object
-
relational model
: Similar to a relational database model, but objects, classes and inheritance
are directly supported in

databa
se schemas

and in the query language.



Star schema

is the simplest style of data warehouse schema. The star schema consists of a few "fact
tables" (possibly only one, justifying the n
ame) referencing any number of "dimension tables". The
star schema is considered an important special case of the snowflake schema.

[
edit
]
Data Structure Diagram

Main article:

Data structure diagram


A data structure diagram (DSD) is a

diagram

and data model used to describe

conceptual data models

by
providing graphical notations which document

entities

and their

relationships
, and the

constraints

that
bind them. The basic graphic elements of DSDs are

boxes
, representing entities, and

arrows
,
representing relationships. Data structure diagrams are most useful for documenting complex data
ent
ities.

Data structure diagrams are an extension of the

entity
-
relationship model

(ER model). In
DSDs,

attributes

are specified inside the entity boxes rather than outside of them, while relationships are
drawn as boxes composed of attributes which specify the constraints that bind entities together. The E
-
R
model,
while robust, doesn't provide a way to specify the constraints between relationships, and becomes
visually cumbersome when representing entities with several attributes. DSDs differ from the ER model in
that the ER model focuses on the relationships betwee
n different entities, whereas DSDs focus on the
relationships of the elements within an entity and enable users to fully see the links and relationships
between each entity.

There are several styles for representing data structure diagrams, with the notabl
e difference in the
manner of defining

cardinality
. The choices are between arrow heads, inverted arrow heads (
crow's feet
),
or numerical representation of the cardinality.



[
edit
]
Entity
-
relationship model

Main article:

Entity
-
relationship model

An entity
-
relationship model (ERM) is an abstract

conceptual data model

(or

semantic data model
) used
in

software engineering

to represent structured da
ta. There are several notations used for ERMs.

[
edit
]
Geographic data model

Main article:

Data model (GIS)

A data model in

Geographic information systems

is a mathematical construct for representing geographic
objects or surfaces as data. For example,



the

vector

data model represents geography as collections of points, lines,
and polygons;



the

raster

data model represent geography as cell matrixes that store numeric values;



and the

Triangulated irregular network

(TIN) data model represents geography as sets of contiguous,
nonoverlappi ng triangles.
[13]




Generic data models are generalizations of conventional data models. They define standardised general
relation types, together with the kinds of things that may be related by such a relation type. Generic data
models are developed as an approach to solve s
om shortcomings of conventional data models. For
example, different modelers usually produce different conventional data models of the same domain. This
can lead to difficulty in bringing the models of different people together and is an obstacle for data
exchange and data integration. Invariably, however, this difference is attributable to different levels of
abstraction in the models and differences in the kinds of facts that can be instantiated (the semantic
expression capabilities of the models). The mo
delers need to communicate and agree on certain
elements which are to be rendered more concretely, in order to make the differences less significant.

[
edit
]
Semantic data model

Main article:

Semantic data model

A semantic data model in software engineering is a technique to define the
meaning of data within the
context of its interrelationships with other data. A semantic data model is an abstraction which defines
how the stored symbols relate to the real world.
[12]

A semantic data model is sometimes called
a

conceptual data model
.

The logical data structure of a

database management system

(DBMS), whether

hierarchical
,

network
,
or

relational
, cannot totally satisfy the

requirem
ents

for a conceptual definition of data because it is limited
in scope and biased toward the implementation strategy employed by the DBMS. Therefore, the need to
define data from a

conceptual view

has led to the development of semantic data modeling techniques.
That is, techniques to define the meaning of data within the context of its interrelationships with other
data. As illustrated in the figure. The real worl
d, in terms of resources, ideas, events, etc., are symbolically
defined within physical data stores. A semantic data model is an abstraction which defines how the stored
symbols relate to the real world. Thus, the model must be a true representation of the

real world.
[12]

[
edit
]
Data model topics

[
edit
]
Data architecture

Main article:

Data
architecture

Data architecture is the design of data for use in defining the target state and the subsequent planning
needed to hit the target state. It is usually one of several

architecture domains

that form the pillars of
an

enterprise architecture

or

solution architecture
.

A data architecture describes the data structures used by a business and/or its applications. There are
descriptions of data in storage and data in motion; descriptions of data stores, data groups and data
it
ems; and mappings of those data artifacts to data qualities, applications, locations etc.

Essential to realizing the target state, Data architecture describes how data is processed, stored, and
utilized in a given system. It provides criteria for data proc
essing operations that make it possible to
design data flows and also control the flow of data in the system.

]
Data modeling

Data modeling in

software engineering

is the process of creating a data model by applying formal data
model descriptions using data modeling techniques. Data modeling is a technique for defining
business

requirements

for

a database. It is sometimes called

database modeling

because a data model is
eventually implemented in a database.
[15]

The figure illustrates the way data models are developed

and used today. A

conceptual data model

is
developed based on the data
requirements

for the application that is being developed, perhaps in the
context of an

activity model
. The data model will normally consist of entity types, attributes, relationships,
i
ntegrity rules, and the definitions of those objects. This is then used as the start point for interface
or

database design
.
[4]

[
edit
]
Data properties

Some important properties of data for which requirements need to be met
are:



definition
-
related properties
[4]



relevance
: the usefulness of the data in the context of your business.



clarity
: the availability of a clear and shared definition for the da
ta.



consistency
: the compatibility of the same type of data from different sources.



content
-
related properties



timeliness
: the availability of data at the time required and how up to date that data is.



accuracy
: how close to the truth the data is.



properti
es related to both definition and content



completeness
: how much of the required data is available.



accessibility
: where, how, and to whom the data is available or not available (e.g. security).



cost
: the cost incurred in obtaining the data, and making it
available for use.

Data organization

Another kind of data model describes how to organize data using a

database management system

or
other data manageme
nt technology. It describes, for example, relational tables and columns or object
-
oriented classes and attributes. Such a data model is sometimes referred to as the

p
hysical data model
,
but in the original ANSI three schema architecture, it is called "logical". In that architecture, the physical
model describes the storage media (cylinders, tracks, and tablespaces). Ideally, this model is derived
from the more
conceptual data model described above. It may differ, however, to account for constraints
like processing capacity and usage patterns.

While

data analysis

is a common term for data modeling, the activity actually has more in common with
the ideas and metho
ds of

synthesis

(inferring general concepts from particular instances) than it does
with

analy
sis

(identifying component concepts from more general ones). {
Presumably we call
ourselves

systems analysts

because no one can say

systems synthesists
.
} Data modeling strives to bring
the data structures of interest together into a cohesive, inseparable, whole by eliminating unneces
sary
data redundancies and by relating data structures with
relationships
.

(3)

Structural metadata
,


The term

metadata

is ambiguous, as it is used for two fundamentally
different concepts (
types
). Although
the expression "data about data" is often used, it does not apply to both in the same way.

Structural
metadata
, the design and specif
ication of data structures, cannot be about data, because at design time
the application contains no data. In this case the correct description would be "data about the containers
of data".

Descriptive metadata
, on the other hand, is about individual insta
nces of application data, the
data content. In this case, a useful description (resulting in a disambiguating

neologism
) would be "data
about data content" or "content about content" thu
s

metacontent
. Descriptive, Guide and the

National
Information Standards Organization
concept of administrative metadata are all subtypes of
metacontent.
[
citation needed
]

Metadata (metacontent) is traditionally found in the

card catalo
gs

of

libraries
. As information has become
increasingly digital, metadata is also used to describe digital data using

metadata standards

specific to a
particular discipline. By describing the

contents

and

context

of

data files
, the quality of the original
data/files is greatly increased. For ex
ample, a

webpage

may include metadata specifying what language
it is written in, what tools were used to create it, and where to go for more on the subject, allowing
browsers to automaticall
y improve the experience of users.

[
edit
]
Definition

Metadata (metacontent) is defined as data providing information about one or more asp
ects of the data,
such as:



Means of creation of the data



Purpose of the data



Time and date of creation



Creator or author of data



Location on a

computer network

where the
data was created



Standards

used

For example, a

digital image

may include metadata that
describes how large the picture is, the color
depth, the image resolution, when the image was created, and other data. A text document's metadata
may contain information about how long the document is, who the author is, when the document was
written, and
a short summary of the document.

Metadata is data. As such, metadata can be stored and managed in a

database
, often called a

Metadata
registry

or

Metadata repository
.
[1]

However, withou
t context and a point of reference, it can be impossible
to identify metadata just by looking at it.
[2]

For example: by itself, a database containing several numbers,
all 13

digits long could be the results of calculations or a list of numbers to plug into an equation
-

without
any other context, the numbers themselves can be perceived as the data. But if given the context that this
database is a log of a book collection, tho
se 13
-
digit numbers may now be

ISBNs

-

information that refers
to the book, but is not itself the information within the book.

The term "metadata" was coined in 1968 by Philip Bagley, in his book
"Extension of programming
language concepts"

[3]

where it is clear that he uses the term in the ISO 11179 "traditional" sense, which
is "structural metadata" i.e. "data abou
t the containers of data"; rather than the alternate sense "content
about individual instances of data content" or metacontent, the type of data usually found in library
catalogues.
[4]
[5]

Since then the fields of information management, information science, information
technology, librarianship and GIS have widely adopted the term. In these
fields the word metadata is
defined as "data about data".
[6]

While this is the generally accepted definition, various disciplines have
adopted their own more specific e
xplanation and uses of the term.

[
edit
]
Libraries

Metadata has been used in various forms as a means of cataloging archived information. Th
e

Dewey
Decimal System

employed by libraries for the classification of library materials is an early example of
metadata usage. Library catalogues used 3x5 inch car
ds to display a book's title, author, subject matter,
and a brief plot synopsis along with an abbreviated

alpha
-
numeric
identificati on

system which indicated
the physical location of the book within the library's shelves. Such data helps classify, aggregate, identify,
and locate a particular book. Another form of older metadata collection is the use by US Census Bureau
of what is known a
s the "Long Form." The Long Form asks questions that are used to create
demographic data to create patterns and to find patterns of distribution.
[7]

For the purposes of this

article,
an "object" refers to any of the following:



A physical item such as a book, CD, DVD, map, chair, table, flower pot, etc.



An electronic file such as a digital image, digital photo, document, program file, database table, etc.

[
edit
]
Photographs

Metadata may be written into a digital photo file that will identify who owns it, copyright & contact
information, what camera created the
file, along with exposure information and descriptive information
such as keywords about the photo, making the file searchable on the computer and/or the Internet. Some
metadata is written by the camera and some is input by the photographer and/or software

after
downloading to a computer. However, not all digital cameras enable you to edit metadata;
[8]

this
functionality has been available on most Nikon DSLRs since the

Nikon D3
and on most new Canon
cameras since the

Canon EOS 7D
.

Photographic Metadata Standards are governed by orga
nizations that develop the following standards.
They include, but are not limited to:



IPTC Information Interchange Model

IIM
(International Press Telecommunications Council),



IPTC

Core Schema for XMP



XMP



Extensible Metadata Platform (an ISO standard)



Exif



Exchangeable image file format, Maintained by CIPA (Camera & Imag
ing Products
Association) and published by JEITA (Japan Electronics and Information Technology Industries
Association)



Dublin Core

(Dublin Core Metadata Initiative


DCMI)



PLUS (Pict
ure Licensing Universal System).

[
edit
]
Video

Metadata is particularly useful in video, where information about its contents (such as
transcripts of
conversations and text descriptions of its scenes) are not directly understandable by a computer, but
where efficient search is desirable.

[
edit
]
Web pages

Web pages often include metadata in the form of

meta tags
. Description and keywords meta tags are
commonly used to describe the Web page's content.
Most search engines use this data when adding
pages to their search index.

[
edit
]
Creation of metadata

Metadata can be created e
ither by automated information processing or by manual work. Elementary
metadata captured by computers can include information about when a file was created, who created it,
when it was last updated, file size and file extension.

[
edit
]
Metadata types

The metadata application is manyfold covering a large variety of fields of application there are nothing but
specia
lised and well accepted models to specify types of metadata. Bretheron & Singley (1994)
distinguish between two distinct classes: structural/control metadata and guide metadata.
[9]

Structural
metadata

is used to describe the structure of computer systems such as tables, columns and
indexes.

Guide metadata

is used to help humans find specific items and is usually expressed as a set of
keywords in a natural language. According
to

Ralph Kimball

metadata can be divided into 2 similar
categories

Technical metadata and Business metadata.

Technical metadata

correspond to internal
metadata,

business metadata

to external metadata. Kimball adds a third category named

Process
metadata
. On the other hand, NISO distinguishes between three types of metadata: descriptive,
structural and administrative.
[6]

Descriptive metadata

is the information used to search and locate an
object such as title, author, subjects, keywords, publisher;

structural metadata

gives a description of
how the components of the object are organised; and

admi
nistrative metadata

refers to the technical
information including file type. Two sub
-
types of administrative metadata are rights management
metadata and

preservat
ion metadata
.

[
edit
]
Metadata structures

Metadata (metacontent), or more correctly, the vocabularies used to assemble metadata (
metacontent)
statements, is typically structured according to a standardized concept using a well
-
defined metadata
scheme, including:

metadata standards

and

metadata models
. Tools such as

controlled
vocabularies
,

taxonomies
,

thesauri
,

data dictionaries

and

metadata registries

can be used to apply further
standardization to the metadata. Structural metadata commonality is also of paramount importance
in

data model

development and in

database design
.

[
edit
]
Metadata syntax

Metadata (metacontent) syntax refers to the rules created to structure the fields or elements of metadata
(metacontent).
[10]

A single metadata scheme may be expressed in a number of different markup

or
programming languages, each of which requires a different syntax. For example, Dublin Core may be
expressed in plain text,

HTML
,

XML

a
nd

RDF
.
[11]

A common example of (guide) metacontent is the b
ibliographic classification, the subject, the

Dewey
Decimal class number
. There is always an implied statement in any "classification" of some obj
ect. To
classify an object as, for example, Dewey class number 514 (Topology) (i.e. books having the number
514 on their spine) the implied statement is: "<book><subject heading><514>. This is a subject
-
predicate
-
object triple, or more importantly, a class
-
attribute
-
value triple. The first two elements of the
triple (class, attribute) are pieces of some structural metadata having a defined semantic. The third
element is a value, preferably from some controlled vocabulary, some reference (master) data. The
c
ombination of the metadata and master data elements results in a statement which is a metacontent
statement i.e. "metacontent = metadata + master data". All these elements can be thought of as
"vocabulary". Both metadata and master data are vocabularies wh
ich can be assembled into metacontent
statements. There are many sources of these vocabularies, both meta and master data: UML, EDIFACT,
XSD, Dewey/UDC/LoC, SKOS, ISO
-
25964, Pantone, Linnaean Binomial Nomenclature etc. Using
controlled vocabularies for the

components of metacontent statements, whether for indexing or finding, is
endorsed by

ISO
-
25964
: "If both the indexer and the searcher are guided to choose the same term for the
sam
e concept, then relevant documents will be retrieved." This is particularly relevant when considering
that the behemoth of the internet, Google, is simply indexing then matching text strings, there is no
intelligence or "inferencing" occurring.

[
edit
]
Hierarchical, linear and planar schemata

Metadata schema can be hierarchical in nature where relationships
exist between metadata elements
and elements are nested so that parent
-
child relationships exist between the elements. An example of a
hierarchical metadata schema is the

IEEE LOM

schema where metadata elements may belong to a
parent metadata element. Metadata schema can also be one dimensional, or linear, where each element
is completely discrete from other elements and classified according to one dimension
only. An example of
a linear metadata schema is

Dublin Core

schema which is one dimensional. Metadata schema are often
two dimensional, or pla
nar, where each element is completely discrete from other elements but classified
according to two orthogonal dimensions.
[12]

Metadata hypermapping

In all cases where the m
etadata schemata exceed the planar depiction, some type of

hypermapping

is
required to enable display and view of metadata according to chosen aspect and to serve special views.
Hypermapping frequently applies to layering of geographical and geological information overlays.
[13]

Granularity

The degree to which data or metadata is structured is referred to as its

granularity
. Metadata with a high
granul
arity allows for deeper structured information and enables greater levels of technical manipulation
however, a lower level of granularity means that metadata can be created for considerably lower costs
but will not provide as detailed information. The majo
r impact of granularity is not only on creation and
capture, but moreover on maintenance. As soon as the metadata structures get outdated, the access to
the referred data will get outdated. Hence granularity shall take into account the effort to create as
well as
the effort to maintain.

(4)

Assembly language



An

assembly language

is a

low
-
level programming language

for a

computer
,

microcontroller
, or other
programmable device, in which each statement corresponds to a single

machine code

instruction
. Each
assembly language is specific to a parti
cular

computer architecture
, in contrast to most

high
-
level
programming languages
, which are generally

portable

across multiple systems.

Assembly language is converted into executable machine code by a

utility program

referred to as
an

assembler
; the conversion process is referred to as

assembly
, or

assembling

the code.

Assembly language uses a

mnemonic

to represent each low
-
level machine operation or

opcode
. Some
opcodes require one or more

op
erands

as part of the instruction, and most assemblers can take labels
and symbols as operands to represent addresses and constants, instead of

hard coding

them into the
program.

Macro assemblers

include a

macroinstruction

facility so that assembly language text can be
pre
-
assigned to a name, and that name can be used to insert the t
ext into other code. Many assemblers
offer additional mechanisms to facilitate program development, to control the assembly process, and to
aid

debugging
.

]
Key concepts

Assembler

An

assembler

creates

object code

by translating assembly instruction mnemonics into

opcodes
, and by
resolving

symbolic names

for memory locations and other entities.
[1]

The use of symbolic references is a
key feature
of assemblers, saving tedious calculations and manual address updates after program
modifications. Most assemblers also include

macro

facilities for perform
ing textual substitution

e.g., to
generate common short sequences of instructions as

inline
, instead of

called

subroutines
.

Assemblers have been available since the 1950s and are far simpler to write than

compilers

for

high
-
level
languages

as each mnemonic instruction / address mode combination translates directly into a single
machine language opcode. Modern assemblers, especially for

RISC

architectures
, such
as

SPAR
C

or

POWER
, as well as

x86

and

x86
-
64
, optimize

Instruction scheduling

to exploit the

CPU
pipeline

efficiently.
[
citation needed
]

[
edit
]
Number of passes

There are two types of assemblers based on how many passes through the source are needed to
produce the executable program.



One
-
pass assemblers go through the source code once. Any symbol used before it is defined will
require "errata" at

the end of the object code (or, at least, no earlier than the point where the symbol
is defined) telling the

linker

or the loader to "go back" and overwrite a placehol
der which had been left
where the as yet undefined symbol was used.



Multi
-
pass assemblers create a table with all symbols and their values in the first passes, then use
the table in later passes to generate code.

In both cases, the assembler must be able t
o determine the size of each instruction on the initial passes
in order to calculate the addresses of subsequent symbols. This means that if the size of an operation
referring to an operand defined later depends on the type or distance of the operand, the
assembler will
make a pessimistic estimate when first encountering the operation, and if necessary pad it with one or
more "
no
-
operation
" instructions in a later pass or the errata. In an assembler
with peephole optimization
addresses may be recalculated between passes to allow replacing pessimistic code with code tailored to
the exact distance from the target.

The original reason for the use of one
-
pass assemblers was speed of assembly


often a seco
nd pass
would require rewinding and rereading a

tape

or rereading a deck of

ca
rds
. Modern computers perform
multi
-
pass assembly without unacceptable delay. The advantage of the multi
-
pass assembler is that the
absence of errata makes the

linking
process

(or the

program load

if the assembler directly produces
executable code) faster.
[2]

[
edit
]
High
-
level assemblers

More sophisticated

high
-
level assemblers

provide language abstractions such as:



Advanced control structures



High
-
level procedure/function declarations and invocations



High
-
level abstrac
t data types, including structures/records, unions, classes, and sets



Sophisticated macro processing (although available on ordinary assemblers since the late 1950s
for

IBM 700 series

and since the 1960s for

IBM/360
, amongst other machines)



Object
-
or
iented programming

features such as

classes
,

objects
,

abstraction
,

polymorphism
,
and

inheritance
[3]

See

Language design

below for more details.

[
edit
]
Assembly
language

A program written in assembly language consists of a series of (mnemonic) processor instructions and
meta
-
statements (known variously as directives, pseudo
-
instructions and pseudo
-
ops), comments and
data. Assembly language instructions usually con
sist of an opcode mnemonic followed by a list of data,
arguments or parameters.
[4]

These are translated by an

assembler

into
machine language

instructions that
can be loaded into memory and executed.

For example, the instruction below te
lls an

x86
/
IA
-
32

processor to move an

immediate 8
-
bit value

into
a

register
. The binary code for this instruction is 10110 followed by a 3
-
bit identifier for which register to
use. The identifier for the

AL

register is 000, so the following

machine code

loads the

AL

register with the
data 01100001.
[5]

This binary computer code can be made more human
-
readable by expressing it in

hexadecimal

as
follows

Here,

B0

means 'Move a copy of the following value into

AL'
, and

61

is a hexadecimal representation of
the value 01100001, which is 97 in

decimal
. Intel assembly language provides the
mnemoni
c

MOV

(an
abbreviation of

move
) for instructions such as this, so the machine code above can be written as follows
in assembly language, complete with an explanat
ory comment if required, after the semicolon. This is
much easier to read and to remember.

MOV

AL
,

61h

; Load AL with 97 decimal (61 hex)

In some assembly languages the same mnemonic such as MOV may be used for a family of related
instructions for lo
ading, copying and moving data, whether these are immediate values, values in
registers, or memory locations pointed to by values in registers. Other assemblers may use separate
opcodes such as L for "move memory to register", ST for "move register to memo
ry", LR for "move
register to register", MVI for "move immediate operand to memory", etc.

The Intel opcode 10110000 (
B0
) copies an 8
-
bit value into the

AL

register, while 10110001 (
B1
) moves it
into

CL

and 10110010 (
B2
) does so into

DL
. Assembly language
examples for these follow.
[5]

MOV

AL
,

1h

; Load AL with immediate value 1

MOV

CL
,

2h

; Load CL with immediate value 2

MOV

DL
,

3h

; Load DL
with immediate value 3

The syntax of MOV can also be more complex as the following examples show.
[6]

MOV

EAX
,

[
EBX
]

; Move the 4 bytes in memory at the address contained in
EBX into EAX

MOV

[
ESI
+
EAX
]
,

CL

; Move the contents of CL into the byte at address ESI+EAX

In each case, the MOV mnemonic is translated directly into an opcode in the ranges 88
-
8E, A0
-
A3, B0
-
B8, C6 or C7 by an assembler, and the programmer does not have to
know or remember which.
[5]

Transforming assembly language into machine code is the job of an assembler, and the reverse can at
least partially be achieved by a

disassembler
. Unlike

high
-
level languages
, there is usually a
one
-
to
-
one
correspondence

between simple assembly statements and machine language instructions. However, in
some cases, an assembler may provide

pseudoinstr
uctions

(essentially macros) which expand into
several machine language instructions to provide commonly needed functionality. For example, for a
machine that lacks a "branch if greater or equal" instruction, an assembler may provide a
pseudoinstruction th
at expands to the machine's "set if less than" and "branch if zero (on the result of the
set instruction)". Most full
-
featured assemblers also provide a rich

macro

language (discussed below)
which is used by vendors and programmers to generate more complex code and data sequences.

Each

computer architecture

has its o
wn machine language. Computers differ in the number and type of
operations they support, in the different sizes and numbers of registers, and in the representations of data
in storage. While most general
-
purpose computers are able to carry out essentially
the same functionality,
the ways they do so differ; the corresponding assembly languages reflect these differences.

Multiple sets of

mnemonics

or assembly
-
language syntax may exist for a s
ingle instruction set, typically
instantiated in different assembler programs. In these cases, the most popular one is usually that supplied
by the manufacturer and used in its documentation.

[
edit
]
Language design

[
edit
]
Basic elements

There is a large degree of diversity in the way the authors of assemblers categorize statements and in the
nomenclature that they use. In particular, some describe anything other than a machine mnemonic or
e
xtended mnemonic as a pseudo
-
operation (pseudo
-
op). A typical assembly language consists of 3 types
of instruction statements that are used to define program operations:



Opcode

mnemonics



Data
sections



Assembly directives

[
edit
]
Opcode mnemonics and extended mnemonics

Instructions (statements
) in assembly language are generally very simple, unlike those in

high
-
level
language
. Generally, a mnemonic is a symbolic name for a single e
xecutable machine language
instruction (an

opcode
), and there is at least one opcode mnemonic defined for each machine language
instruction. Each instruction typically consists of an

operation

or

opcode

plus zero or more

operands
.
Most instructions refer to a single value, or a pair of values. Operands can be immediate (value coded in
the instruction itself), registers specified
in the instruction or implied, or the addresses of data located
elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely
reflects how this architecture works.

Extended mnemonics

are often used to specify a com
bination of an
opcode with a specific operand, e.g., the System/360 assemblers use

B

as an extended mnemonic
for

BC

with a mask of 15 and

NOP
for

BC

with a mask of 0.

Extended mnemonics

are often used to support specialized uses of instructions, often for p
urposes not
obvious from the instruction name. For example, many CPU's do not have an explicit NOP instruction, but
do have instructions that can be used for the purpose. In 8086 CPUs the instruction

xchg ax,ax

is used
for

nop
, with

nop

being a pseudo
-
opco
de to encode the instruction

xchg ax,ax
. Some disassemblers
recognize this and will decode the

xchg ax,ax

instruction as

nop
. Similarly, IBM assemblers
for

System/360

and

System/370

use the extended mnemonics

NOP

and
NOPR

for

BC

and

BCR

with zero
masks. For the SPARC architecture, these are known as

synthetic instructions
[7]

Some assemblers also support simple built
-
in macro
-
instructions that generate two or more machine
instructions. For instance, with some Z80 assemblers the instruction

ld hl,bc

is recognized to genera
te

ld
l,c

followed by

ld h,b
.
[8]

These are sometimes known as

pseudo
-
opcodes
.

[
edit
]
Data sections

There are instructions used to define data elements to hold data and variables. They define the type of
data
, the length and the

alignment

of data. These instructions can also define whether the data is
available to outside programs (programs assembled separately)

or only to the program in which the data
section is defined. Some assemblers classify these as pseudo
-
ops.

[
edit
]
Asse
mbly directives

Assembly directives, also called pseudo opcodes, pseudo
-
operations or pseudo
-
ops, are instructions that
are executed by an assembler at assembly time, not by a CPU at run time. They can make the assembly
of the program dependent on paramete
rs input by a programmer, so that one program can be assembled
different ways, perhaps for different applications. They also can be used to manipulate presentation of a
program to make it easier to read and maintain.

(For example, directives would be used
to reserve storage areas and optionally their initial contents.) The
names of directives often start with a dot to distinguish them from machine instructions.

Symbolic assemblers let programmers associate arbitrary names (
labels

or

symbols
) with memory
locations. Usually, every constant and variable is given a name so instructions can reference those
locations by name, thus promoting

self
-
documenting code
. In executable code, the name of each
subroutine is associated with its entry point, so any calls to a subroutine can use its name. Inside
sub
routines,

GOTO

destinations are given labels. Some assemblers support

local symbols

which are
lexically distinct from normal symbols (e.g., the use of "10$" as a GOTO destination).

Some assemblers
, such as NASM provide flexible symbol management, letting programmers manage
different

namespaces
, automatically calculate offsets within

data structures
, and assign labels that refer
to literal values or the result of simple computations performed by the assembler. Labels can also be
used to initialize constants and vari
ables with relocatable addresses.

Assembly languages, like most other computer languages, allow comments to be added to
assembly

source code

that are ignored by the assembler. Good u
se of comments is even more important
with assembly code than with higher
-
level languages, as the meaning and purpose of a sequence of
instructions is harder to decipher from the code itself.

Wise use of these facilities can greatly simplify the problems o
f coding and maintaining low
-
level
code.

Raw

assembly source code as generated by compilers or disassemblers

code without any
comments, meaningful symbols, or data definitions

is quite difficult to read when changes must be
made.