Summary IT Architectures and Middleware

schoolmistInternet και Εφαρμογές Web

22 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

74 εμφανίσεις

1


Summary IT Architectures and
Middleware

By

Niels Krijger
niels@kryger.nl

Note: this summary is a bit of overkill for this course

(as of 29
-
6
-
2010)
.
Check the sheets, this
summary, make sure you understand the main con
cepts and practice the example exam questions!

Contents

Chapter 1: The problem

................................
................................
................................
..........................

1

Chapter 2: The emergence of standard middleware

................................
................................
..............

3

Chapter 3: Objects, components, and the web
................................
................................
.......................

4

Chapter 4: Web services

................................
................................
................................
..........................

5

Chapter 5:

A technical summary of middleware

................................
................................
.....................

6

Chapter 6: Using middleware to build distributed applications

................................
.............................

8

Chapter 7: Resiliency

................................
................................
................................
.............................

10

Chapter 8: Performance and scalability

................................
................................
................................

12

Chapter 9: Systems management

................................
................................
................................
.........

15

Chapte
r 10: Security

................................
................................
................................
..............................

16

Chapter 11: Application design and IT Architecture

................................
................................
.............

17

Chapter 12: Implementing business processes

................................
................................
.....................

19

Chapter 13: Integration design

................................
................................
................................
..............

21

Chapter 14: Information access and information accuracy

................................
................................
..

22

Chapter 15: Changing and integrating applications

................................
................................
..............

25

Chapter 16: Building an IT Architecture

................................
................................
................................

26


Chapter 1
: The problem

IT architecture = is a solution to the

problem “How do I structure my IT applications to best suit my
business?”. An architecture identifies the components of a problem space, shows the relationships
among them and defines technology, rules and constraints on the relationships and components.

Typical architecture figure 1
-
4. Environment three parts: presentation, application and data.

Middleware = Software that is necessary in practice to build distributed applications. An important
characteristic is it can work over a network (but not necessar
ily does at all times!)

2


Eight elements to consider in middleware (see figure 1
-
5):

1.

The communications link

2.

The protocol

3.

The programmatic interface

4.

A common data format

5.

Server process control
, manages scheduling

6.

Naming/Directory services
, manages finding
others on the network

7.

Security
, ensures safe communication

8.

Systems management
, keeps the whole thing operation properly (fault mgt, performance…)

Silo applications = stand
-
alone applications that are generally difficult to integrate

Why still use silos? 1)

Don’t lose power, 2) self
-
contained projects easy to control, 3) development
methodologies are silo bases, 4) fear of big integrated systems, 5) fear of changing large existing
applications.

Surround architecture = interface ‘surrounding’ the silo applica
tions. Includes a Hub towards
presentation devices and a Merge to create a consolidated data view.

There are two options to change software:

1.

Substantially change the existing application (new interface)

2.

A rewrite

When to rewrite?

1.

Serious issues with suppor
t

2.

Business
functionality

changed

Eventually all applications need to evolve.

You can split big rewrite jobs in smaller rewrites and
gradually evolve.

To build an architecture you need:

-

Technologists

-

Application designers

-

Business representatives

Key lesson
s chapter 1:

-

Make an architecture that adapts easily to changing needs

-

Architecture separates applications logic from any representation by access
channels

-

Foll
ow an evolutionary approach when

implementing the architecture

-

Ensure technical issues are addre
ssed up front to avoid operationally unusable solutions

-

Do not finish the architecture and leave it on a shelf
.

Enable data transfer (medium)

Enable communica
tion (language)

3


Chapter 2: The emergence of standard middleware

TCP/IP = set of standards developed by the military that became popular because ARPANET (who
us
ed TCP/IP) became so popular and evolved in the worldwide Internet.

Requirements for middleware:

1.

Ease of use (compared to writing it yourself using low
-
level API
-
like sockets

2.

Location transparency (not know the network and address)

3.

Message delivery integri
ty (not lost or duplicated)

4.

Message format integrity (not corrupted)

5.

Language transparency (communicate with programs in other languages)

RPC = Remote Procedure Calls, the syntax of the client (caller) and the server (the called) programs
remain the same,
just as if they were on the same machine. Examples: Open Network Computing
(ONC) from Sun and Distributed Computing Environment (DCE) from OSF. It works like this: you write
an Interface Definition Language (IDL) file. The IDL generates a
client stub

and a

server skeleton
. The
sub converts parameters into a string of bits and the skeleton converts them back. This parameter
-
conversion is called
marshalling
, see figure 2
-
4. The word marshalling is slowly being replaced by
serialization
. This process does requ
ire multithreading, otherwise the client is left waiting.

Advantages
: programming like no server exists

Disadvantages
: caller is blocked while waiting for server response (alleviate with multithreading)
,
plus many clients create overhead.

Remote database a
ccess = ability to read and write to a database on a different machine. Two
approaches: SQL or by disguising the remote database as a local database. Creates large overhead on
the network. Stored procedures are a way to speed up remote database access. The
re are loads of
remote database access technologies: ODBC, OLE DB, ADO, ADO.NET, JDBC, JDO, etc.

Disadvantages
: poor performance on transaction processing

Advantages
: excellent at processing ad
-
hoc queries on remote databases

Distributed transaction proces
sing middleware synchronizes transactions in multiple databases
across the network.

Transactions are used to ensure execution of all database commands or none at all. Should adhere to
ACID:

-

Atomic = transaction is never half done

-

Consistent = the database
constraints hold true before and after execution of command

-

Isolation = data updates are not visible until entire transaction is done

-

Durable = transaction is truly done, updates don’t disappear for some magical reason in the
future

Transactions ensure ACI
D properties with database locks and rollback mechanisms.

Transactions are
the steps that business processes take. If someone changes one step into two smaller steps, or adds
or removes a step, they change the business process.

Message queuing = program
-
to
-
message queue, a very fast mailbox, the recipient does not have to
be active. You Put messages in the queue, and another program does a Get.
This is e
specially useful
4


when

applications
only
require sub
-
second response times. Best known is WebSphere MQ.
Di
sadvantages
:

include no IDL and no marshalling. Thus, the receiver must know the message layout.

Also transactions might not be isolated; others can get in between.

Advantages:
network can go down without problem, doesn’t need two
-
phase commit

(see page 14
1
for details)
, secure delivery.

Message queuing can be synchronized with database transactions, making it possible to build
systems with good levels of message integrity.

SQL parsing: a SQL text is send to the server which parses it to a query plan. The c
lient receives a
Query Output Description and sends the execute command plus some parameters, after which the
query result is send. To speed this process up the query plan can be cached.

C
hapter 3: Objects, components, and the web

OO
-
middleware: in the end

of the 80’s Object Oriented middleware came into existence. An example
is Common Object Request Broker Architecture (CO
RB
A). It basically calls methods on objects
remotely; very similar to RPC.
You no longer have client
-
server but client
-
object. Two steps

are
required: 1) getting a reference from to the object, and 2) calling an operation on the object.

It may
require an IDL (Interface Definition Language)
-
file that create stubs and skeletons. However, most
OO middleware use a macro
-
language that provides
the interface (in RPC the concept interface is less
common).

Disadvantages
: complex, plus interoperability between CORBA implementations is low.

Advantages: it fits naturally in OO
-
languages. It is more flexible: interface is delinked from server,
implemen
tation and specification are detached.

Problems in OO middleware
are
:

1.

How to get an object reference?

Three ways: 1) A special object reference is returned to the client when it first attaches to the
middleware. 2) The client calls a special “Naming” servi
ce that takes a name. The directory
returns the location to the object. 3) An operation on one object returns a reference to
another object (cannot be used to retrieve the first object)

2.

When are objects created and deleted?


3.

Is it a good idea for more than

one client to share an object?

Component technology:
an OO component creates one or more objects and then makes the
interface of some or all of these objects available to the world outside the component. Example:
Component Ob
ject Model (COM) from Microsof
t and Enterprise JavaBeans from Sun.

Transactional component middleware = makes the transaction processing systems easier to
implement and more scalable. It provides a container with many features, most notably transaction
support and resource pooling. A c
omponent is placed in such a container (i.e. moving it to a file
directory where it can be accessed and registered by the container). The container contains an object
wrapper that is called by the client to call the component. Because of this the container

can take care
of security and memory efficiency.

5


In COM+ you can declare to
deactivate

an object after a transaction or operation; deactivation means
elimination. COM+ does not store anything on session basis; a feature that is typically used for
storing
temporary data.

Enterprise JavaBeans (EJB) is actually a standard, not a product and thus has several
implementations. There are two types: 1) stateless session beans; its state is eliminated after every
operation invocation, and 2) stateful session beans;

hold state for their entire life. The container
decides what happens. Often values are being cached in beans to improve performance, but this
does destroy transaction integrity.

Disadvantages com+: only works on windows

Disadvantages EJB: only works using

java.

Internet applications

Differs from normal applications in:

1.

User is command

(back buttons, favorites, explicit URL addresses, etc)

2.

Not all users have the same connection, pc, etc.

3.

You cannot identify the user by their network address

4.

Public medium wi
th security being a major concern

5.

Many short interactions of many users make many internet application painfully slow

Using sessions or cookies you can store information longer when accessing a web application. In the
olden days the session was between the

workstation and application, now it is between cookie and
transaction server. If the client or connection fails; no problem because everything remains stored,
both the session and data (in the
web server
). Stateless sessions are to be preferred.

KISS = Ke
ep it Simple, Stupid

Chapter 4: Web services

Services are easy to understand, you have a requester requesting a service, and a provider fulfilling
the request. A web browser

or other PC software
-

acts as a proxy on his or her behalf.

In IT context
program
s can be providers of services to other programs. A provider himself can access other
services to provide his own service.

A big problem is what to define as a service: what characteristics must it have? Many definitions
exists, the book follows one by des
cribing the attributes.

A
service
provider’s attributes:

1.

Independent of any requester; he is a black box

2.

Verifiable identity (name) and interfaces

3.

Possible to replace existing implementation with a new version with maintaining backwards
compatibility

4.

Can b
e located

5.

Can be invoked by requesters of its services and can invoke services himself

6


6.

Contains mechanisms for recovering from errors (ACID)

Although a service provider’s internal structure is unknown to requestors of its service, a service
-
oriented archit
ecture (SOA) may be used within the provider itself. This gives a two
-
level approach in
that external services are implemented by internal services.

If you look at the internet, it is very service oriented. Users act as requestors and internet
applications

provide services for them.

The W3C is responsible for many of the web standards and architectures. XML plays a major role in
providing services because it is platform, vendor, language, and implementation independent.

The key messaging technology is SOAP
; Simple Object Access Protocol. It is based on XML and is
typically carried over HTTP. A simple example of SOAP is on page 68. A SOAP message comprises of
an envelope and a body which contains the application payload. You define the namespace which
define
s a collection of names on which you build your SOAP message body. Optionally processing
information can be set in a header. With it you can build complex sequences of interactions
(authentication, encryption, transactions, etc).

WSDL provides the descript
ion of services in an XML document.

UDDI (Discovery, Description and Integration) is a protocol used to find (‘discover’) services. It
contains white pages (contact information), yellow pages (description of the business) and green
pages (technical informa
tion).

There remains a lot of work to be done to make sure different providers and requesters understand
and interpret each other equally; they need one ontology. The W3C is working on the Web Ontology
Language (OWL
-
> not WOL!) to improve integration.

To
avoid UDDI many groups are just using SOAP and agreed upon
-
structures, avoiding complications
of extensive discovery of services.

If you go the extreme, an organization could outsource all of
its

IT functions by web services.
However many small organizatio
ns providing parts of the IT environment does bring other problems
(reliability, continuity, etc)

Chapter 5: A technical summary of middleware

Answers the question “What middleware do we need?” by describing the eight elements of
middleware (see page 78).

1.

The communications link: most are restricted to TCP/IP. You might not need the extra
services offered by TCP/IP (e.g. DNS for converting names into IP addresses). You have two
types of protocols: with or without connections. Without connections it is much
like writing a
letter and a receiver on it and hope it finds its destination. UDP is connectionless. TCP is a
connection protocol and offers useful features: 1) no message loss, 2) received in the right
order, 3) no message corruption, 4) no message duplic
ation.

2.

The middleware protocol: Middleware protocols are generally dialogs and thus require
connections. E.g. client/server (many
-
to
-
1), peer
-
to
-
peer (1
-
to
-
1), and push protocols (1
-
to
-
7


many, e.g. publish and subscribe). To build connectionless protocol you

can use SOAP and
headers. Two
-
phase commit between requester and provider cannot be implemented
through message header information; it requires additional messages to flow between the
parties.

3.

The programmatic interface:
a set of procedure calls used by a

program. Can be a huge
variation, three classifications possible;

(1) by what
enti
ties are communicating (terminals and mainframes, processes with
processes, clients with message queues, etc.). Historically, entities communicating with each
other have be
come smaller and smaller.

(2) Nature of the interface of which there are
basically

two types; APIs and GPIS. An
Application Programming Interface is a fixed set of procedure calls. Generated Programming
Interfaces either generate the interface from the co
mponent source or from a separated file
written in an IDL. Within API there are three styles of interface: 1) message
-
based, 2)
command
-
language

based (command is encoded in a language, e.g. SQL), and 3) operation
-
call
-
based (name of server operation and p
arameters is built up by a series of middleware
procedure calls, e.g. COM+).

Many middleware have both API and GPI interfaces. API is for interpreters and GPI is for
component builders.

(3) according to the impact on process thread control. Either 1) Block
ed (synchronous); the
thread stops untill reply arrives, 2) unblocked (asynchronous); client every now and then
looks to see whether reply has arrived, and 3) event
-
based; when reply arrives and client is
woken up.

4.

Data presentation: both sender and receiv
er must know the structure of the message and
their character encoding for example.

Most middleware take care of that by using
serialization (marshalling).

5.

Server control: break down in three main tasks:

(1) Process and thread control: a server must be run
ning to receive messages, you might
need a load balancer that is capable of processing it.

(2) Resource management: e.g. database connection pooling

(3) Object management: objects may be activated or deactivated

Web service standards have nothing to say ab
out server control.

6.

Naming and directory services: typically IP address or DNS. Directory services go one step
further and provide looking up functions (e.g. Microsoft Active Directory).

7.

Security: access control, encryption.

8.

System management: a human inte
rface to all this software for operational control,
debugging, monitoring, and configuration control.

Because of these aspects
and especially the huge amount of programmatic interfaces
there is a huge
variety of middleware.

Vendor architectures deal with w
hich and how many of the middleware you need. Currently there
are two big ones: .NET and J2EE. Both are interpreters, .NET uses a Common Language Runtime and
J2EE the Java Virtual Machine.

See page 88 for .NET architecture and page 89 for J2EE.

Generally .
NET
interprets more coding languages and J2EE runs on more platforms.

Vendor architectures serve a number of function of which three are discussed in the book:

8


1)

Positioning: a well
-
presented architecture lets you see at a glance what elements you need to
se
lect to make a working system. The user knows what he is involved in.

2)

Strawman for user target architecture: architectures tell users how functionality should be
split (e.g. into layers such as presentation, business logic, and data).

3)

Marketing: it shows y
ou how to develop, it provides a strategy. The problem is when explain
an architecture you are explain very complex software.

Many vendors don’t explicitly name their architecture; many buy SAP or Oracle and you buy
-
in on
their architecture. When you have
different architecture based software you might need Middleware
Interoperability: building software for different middleware technologies.

A Hub or Gateway,
‘Enterprise Application Integration software (EAI)’ can achieve this (sometimes this is packaged wi
th
an applicati
on directly). The main question

is whether it is safe. For example application A sends a
message to the hub which in turn calls app B using Java Remote Method Invocation (RMI). Normally
A is guaranteed it arrives only once and arrives
guaran
teed;

however does a hub provide these
guarantees? One solution is a two
-
phase commit spanning app A and B.
Alternatively;

the transaction
can have a unique identifier and

the hub calls B if it did not
reply.

The same problem applies for
connection loss. E
ither A and B are handled in a single transaction, meaning two
-
phase commit and
synchronizing queues, or handle it at the app
-
level (e.g. A asks B “is the transaction done?”). Even
more complex; a
stateful

session between A and B, but a message queue does
not have a concept
session. To solve this generally a session ID is used.

Chapter 6: Using middleware to build distributed applications

From a user’s perspective, there are four groups of distributed processing technology:

1.

Transaction technology

2.

Informatio
n retrieval technology

3.

Collaborative technology (e.g. email)

4.

Internal IT distributed services (software distribution or remote systems operations)

You have two types of messages

(or three, if you count inquiries separately)
:

1.

Real
-
time: inquiries or ‘action

now’

2.

Deferrable:
In business processes actions by others require messages that are deferrable
(capable of being postponed), often implemented using asynchronous message technology.
You cannot translate a real
-
time transaction into a deferrable transaction

without a lot of
thought. It is often simpler to first do a real
-
time transaction, and if that fails do a deferrable
one.

Middleware choices for real time include RPC, CORBA, EJB, COM+, Tuxedo, and SOAP. It is generally
not recommended to use message queu
ing for real
-
time processing because:

1.

Two transaction servers with message queuing cannot support distributed transaction
processing

2.

Real
-
time calls have an end user for the reply; they have a time
-
out if it fails. With message
queuing the user may go away

if it takes too long but eventually the output response is sent;
often ending up in a “dead letter box”.

9


3.

There can be an enormous amount of queues, when you have thousands of users you end up
with thousands of queues.

4.

Queues have no IDL and no control of
message format.

5.

For high performance you need to write your own scheduler.

An alternative view, to process deferrable message real
-
time has also problems:

1.

It’s slower, messages cannot be buffered

2.

If the destination server is down the calling server cannot
operate.

3.

Message
-
queuing software has various hooks that can be used to automate operational tasks

In most cases, real
-
time transactions are used between user and server, and deferrable transactions
between databases.

Information retrieval is positioned al
ong four dimensions: 1) timeliness (the sp
eed), 2) usability (raw
data, fragmented information, inconsistencies, etc), 3) degree of flexibility to the query (only an ID up
to complex SQL queries), and 4) whether the users wants to get the data or wants to
be informed
when something changes (time
-
based push, event
-
based push or pull).

Distributed system software is converging towards each other on a technical level (using your TV top
box to pay your bills) and interface level (using email as a reporting mech
anism).

There are three program tiers:

1.

Presentation tier: especially the banking industry allows doing your banking using different
devices; an ATM, web, bank clerk, phone, etc.

Whatever the channel, there are only a few
types of messages for the back
-
end
server: real
-
time, deferrable, ad
-
hoc queries, simple push
messages and reports.

2.

Processing tier: the programming glue between interface and database. Contains the
decision logic that takes the input request and decides to do something with it (e.g. the
bu
siness rules). The processing tier should support many small messages or a few big ones
(e.g. filling in an order part by part or all at once). This becomes more troublesome when
session state and recovery issues apply. The right order lines need to end up

in the correct
order.

Especially when small messages come from different channels. Generally it is easier to
make the inward session (the processing tier interface) session
-
less.

3.

Data tier: basically the database. You have to decide whether you run it loc
ally (which is
safer) or on a separate server (which creates a lot of network overhead). Also running it
locally has the benefit that when the database schema changes you can identify all instances
accessing it, over a network this becomes more difficult.

A second decision is whether to use
a database handler (an abstraction). Today SQL is a bloadly supported query language and
these database handler are needed less, they do however have the advantage that when the
database changes your application doesn’t
have to.

Any of the interfaces of a tier could be made into a service. This is only logical if you end up with 1) a
loosely coupled interface and 2) the interface is used by many requesters.

There are
three

common distributed architecture patterns in use:

10


1.

Middleware bus (or ring) architectures

(tightly coupled)

Middleware that unlocks core applications to other applications. It is fast, secure and flexible.
Most middleware busses are custom built and many organizations worry about maintaining
it.

2.

Hub
-
and
-
sp
oke architectures

(somewhere in between)

A hub is a server routing messages. It can do a lot, like routing messages based on type,
origin,
or
reformat the message, broadcast the message, add information, etc. etc. A hub can
thus act as a bridge between dif
ferent networking and middleware technologies. Using hubs
makes everything more flexible, however also forms a single point of failure, another link in
the chain, and another technology to pay for.

Especially with a lot of ad hoc software you
could end up
with something very complicated and you no longer know what does what and
how they relate to each other.

3.

Web services architectures

(loosely coupled)

Use standards such as SOAP, WSDL, and UDDI. Technically they are just a collection of
middleware technolog
ies. They form a cheap alternative to integrate software. Compared to
traditional software web services are slower because of their translation to XML messages.
Also many organizations provide interfaces to web services.

These architectures are not mutuall
y exclusive, many organizations have all of them. Many however
fall into a
fourth

category: ad hoc, or point
-
to
-
point architectures.

Coupling is the degree to which one party must make assumptions about the other party. The more
complex the assumptions, th
e tighter the coupling. Tighter coupling requires more changes than
loose couplings. Ideally you can test each component individually. Dependencies fall into several
categories: protocol dependency (use same middleware standard), configuration dependency (
how
does it cope with changes?), message format dependency, message semantic dependency (how to
interpret the message), session state dependency (what order do messages need to be
send/received in?), security dependency (some services not available to some

users), business
process dependencies and business object dependencies (identifying objects in different actors). All
these fall in three groups: technology dimension which can be resolved by following the same
standards which are inherently flexible (e.g
. XML). Application dimension (message format and
session dependencies) which can only be changed in the application. Wider concerns (business
process, business object and again security), may require changes in many applications.

Tightly and loosely coup
led has two dimension; apps can be loosely coupled along the technology
dimension but tightly coupled at the application dimension.

Chapter 7: Resiliency

Parallel servers greatly reduce server downtime. If one server is down 1 in 100 days, two could be
bot
h be down 1 in 10000 days (theoretically).

The obvious way to improve resiliency is to use backup servers. Recovery consists of four steps:

1.

Detect the failure: a heartbeat feature checks whether the primary server says “Yes I am still
running”. This tells
you next to nothing and you need to do extra tests to figure out what’s
wrong and if the production application really is running. E.g. the response times of the last
11


10 transactions can be useful. When you switch to backup in case of an error lots can go
wrong; plus many software and operational problems are not cured by a switch. Very
resilience
-
conscious sites will make the switch easily because they’ve taken care of these
problems, most organizations try to avoid switches.

2.

Cleanup work in progress:

ther
e are two ways to backup a server 1) copy database logs and
apply them against a copy of the database, or 2) have the disk subsystem keep a mirror copy

of the disk on the backup system. The first is more efficient, the second easier. The database
should cl
ean up any uncompleted transactions.

3.

Activating the application:
once database and message queues are tidied up users need to be
forwarded to the backup server by either running it under the same IP, using an intelligent
router or special protocol written
between client and server (starting a new session without
telling the user). Batch recovery is also described in the book but I can’t really understand it
(p 127
-
128, have fun). In distributed systems there are three options: 1) the client failing,
similar

recovery as batch recovery, 2) sever failing, simply recover last transaction, and 3)
both fail at the same time. It is easiest to keep states stored in the database and keep server
and client stateless.

4.

Reprocessing “lost” messages: database reco
very alo
ne is not enough. Two problems when
server fails: 1) The client does not know whether the transaction was completed or not, and
2) if it did complete what was the last output message? You can solve this by storing
sequence number separately and let the cli
ent interrogate the server.

Switch can cause major delays. You can have the user active at both operations and backup systems
at the same time: dual activity. Two approaches:

1.

Clustering: the database (which is mirrored) is shared by both systems. This has
buffer
management problems, lock management, lock manager duplication and duplicate log files.
These problems are all solved but at the cost of lower performance.

2.

Two
-
database approach: each system has its own database. Each transaction is processed
twice,

once on each database. Two
-
phase commit ensures the systems are kept in sync. If
one goes down the system simply stops sending it
sub transactions
. Two problems: 1) when
failed system comes back online it must catch up all missed transactions. Problem her
e is
they need to be processed in the same order the commits were processed (thus not the
input order). 2) The system needs to handle the situation where the client connection to
both systems is working just fine but the connection between the two systems
is broken.

The network however is often very complicated, involving many clients and many servers. Also the
router is a single point of failure as well and requires backup. Also, a switch would go seamlessly
when web servers maintain two connections, one t
o the active transaction server and one to the
backup and only use the active connection.

Most downtime is caused by planned downtime. You can use online backup copies or special disk
-
mirroring features to copy the database onto another machine. Online bac
kup copies require after
-
images to bring the database back to a consistent state. Most large
-
scale enterprise disk subsystems
have features that make a mirror copy and then logically break the disks from one machine and
12


attach them to another.

You can hand
le many configuration changes in hardware and software by
intelligent use of backup systems (take backup offline, change software or hardware, brink backup
online, switch to backup, repeat process on other machine).

In many ways application software failur
e is the worst kind of error. Programmers should 1) prevent
bad data getting to the database, 2) look for corruption in the database, 3) display as much
information as possible about any errors detected.

In a worst case the database can get corrupted. If
it does you need to set back a uncorrupted backup

if you can find the uncorrupted point in time. Older

traditional mainframe transactions are generally
a lot easier to recover than object middleware.

IT people think too often backup is an IT issue where
it is actually a business concern and designers
need to set resiliency goals; adding resiliency is a costs and cost/benefit is a business analysis.

There are three parts to resiliency analysis:

1.

Data safety analysis (distributed, or two central databases, o
r departmental databases, etc)

2.

Uptime analysis (uptime is a nuisance

and inactive workers cost money
-
, but losing data is
much worse, what uptime do you really need for what costs?)

3.

Error
-
handling analysis (
look at errors caused by external factors (e.g.
no manager in
department) and those of IT infrastructure breakdown, user errors, and program errors).

Chapter 8: Performance and scalability

We have been looking for 20 years at Moore’s law but still performance issues remain. A reason for
this is the un
-
s
lippery slope: there is a ‘gap’ where processers wait a lot because they do not have the
right information in the memory. Even though processors have a cache these days the problem
remains. You can dampen this effect by introducing sophisticated hardware a
rchitectures.

To push a program down the slope you can do: 1) reduce active memory, 2) reduce number of task
switches, 3) reduce lock contention, 4) reduce # of IO’s, 5) reduce # of network messages, 6) reduce #
of memory overlays.

Bottlenecks in transacti
on processing:

-

A network of 10
-
Mbit LAN delivers considerably less than 10
-
Mbit depending on the number
of devices connected on it.

-

Disk throughput: because IO operations are not evenly distributed over all disks you need
man
y more disks to handle the loa
d, ending up with loads of hard disks but most of which
doing very little.

-

Total efficiency: you need roughly 30% idle time because queue
-
times deteriorate after that.
Check the formula: Total time = service time / (1


utilization). 70% utilization means
the
total time is a bit over 3 times the service time!

-

Memory can run out very quickly; 2MB for each terminal connection would surpass the limit
of 4GB of many OS with 2000 connections. A transaction monitor is a solution; the number of
parallel copies is
then the number of active transactions rather than the number of
terminals.

13


Object interfaces: some languages (e.g. Visual Basic) encourage you to fetch attributes for each
transaction. What used to be “read and write” or “find, get, and update”, you know
have “find, do 10
messages to get 10 attributes, display, do 2 updates”. Increasing network traffic and especially
processing power.

Transactional component servers: for various reasons e.g. J2EE application servers do not scale well
vertically (= putting
more
than 4
processors on it). Solution is to scale horizontally; put an extra
server, set it up the same and balance the work load.

Two
-
phase commit: causes extra messages thus network traffic and processing power. The
transaction also takes longer and da
tabase locks are held longer (= lock contention).

Message queuing frees up network and processing load but does increase the time for a message to
arrive.

Remote database access for real
-
time transactions are inefficient. To alleviate their workload you c
an
use stored procedures but you do convert a database server into a transaction server. Instead, use a
real transaction server (e.g. NET Enterprise Services or Enterprise JavaBeans) with better support for
multithreading and connection pool management.

Ba
tch

There are three reasons for batch:

1.

Support cyclical business processes (payroll or bank interest accrual)

2.

Support housekeeping tasks (copying database to backup)

3.

Optimization (for example when you have to insert 100.000 records, each with an index,
tra
nslating into between 200.000 and 400.000 IOs would take a few hours!)

With 24/7 economies, the internet, the time to do batches in has been shrinking.

Distribution

In general distribution has four problems:

1.

It requires a great deal of extra coding; transa
ctions, reports and inquiries become more
complex

2.

Evenly spreading of data is hard, larger machines likely support much more traffic than
smaller ones

3.

Considerable additional overhead, such as two
-
phase commits

4.

Much more difficult operationally

Unless you
built the application as a distributed system from the ground up; it is very difficult to
change.

Load balancing

Fool the client when he accesses an IP that there is only one webserver, where in fact there are a lot
more. The problem is mainly to keep the
data equal on all servers. Easiest solution is to put all the
state in the back
-
end database

(avoid stateful session beans for example).

14


Load balancing on database servers is tricky; if there are ‘hot spots’ of data which transactions want
to change all th
e time performance will be poor. Examples are: control totals and end of indexes. You
have to design load balancing with that in mind.

Business intelligence systems

BI systems is a broad term used for
data marts

to decision support systems. Two performance

issues:

1.

Ad hoc database queries: large queries will dominate the IO capacity (e.g. memory usage,
database buffers, they squeeze out the rest of the work). Data replication in other database
servers can solve this problem.

2.

Data replication: introduces its
own set of problems. The basic is simple, make a copy and
keep it up
-
to
-
date. It increases network load (although you can batch updates together) and
the IO load is generally higher on a query database than on a transaction server. You could
let the target

machine get behind and catch up during the day or add more memory.

Backups and recovery

Basically

the bigger the organization, the faster you can get failures and the longer it takes to recover
from these, while at the same time you want less of these.

Wh
en you have to deal with large
amounts of stored data (e.g. a bank with 2 million transactions per day) and you backup all these, it
will take quite a while (more than a day even) to restore the database.

Web services

All issues with online transaction pro
cessing equally apply to web services but do have some issues of
their own.

Generally web services use SOAP built on XML which takes time to process, and runs on HTTP which
is much slower than DCOM, RMO, or raw sockets.

In addition the internet slows you d
own depending on the number of hops and distance you have to
travel. Secondly, many user queries in the web browser require information from different systems,
adding to the load.

UDDI and SWDL take up extra space as well, therefore many rather use just SO
AP
for their network to get familiarity with the technology.

Design for scalability and performance

You should consider performance and scalability early in design because:

1.

Performance consequences of data distribution can be assessed

early on

2.

Difference b
etween deferrable transactions and real
-
time transactions should be noted. Use
message queues for deferrable transactions and distributed transaction processing for real
-
time.

3.

The application design model gives the data and transaction volumes.

At early (h
igh) level you measure the scale of the problem. At more detailed level you can look at
actual data. At the very detailed level you can investigate the interfaces, code and database usage
profiles.

15


At the start of the chapter the TPC
-
C benchmark was explai
ned (measures how many transactions
can be performed by a processor) but after all the information here you can understand a lot is
missing in that test; mirror disks, restart areas and other resiliency code is left out.

Chapter 9: Systems management

Syste
ms management can be divided into 5 categories:

1.

Administration; concerned with all aspects of managing the configuration of a system

2.

Operation; concerned with keeping the system running

3.

Fault tracking and resolution; info about faults not immediately resol
ved

4.

Performance management and security

5.

Security

The functional (first four) categories are interrelated, see page 171. Often these categories are being
managed by different groups in the business.

Between the s
ixties and eighties systems were often just s
ilos (see page 172) but slowly migrated to
distributed systems (page 172) and with it increasing systems management considerably. From only
internal users we now have an internet network IP & others (workstations, routers, switches,
browsers, etc), outsour
ced IP network (
managed by network provider)

and the Internet.

A number of
environmental attributes have effect on systems management:

-

The environment is very complicated, systems may run well into the thousands

-

Large number of components means there are h
uge numbers of different conditions; how to
distinguish between what is important and what not?

-

The IP network is outsourced which makes it difficult to fix problems if any arises (e.g. a teller
cannot reach the banking system).

-

Web services are outsourced

as well.

These days you have more complex administration tools allowing you to manage multiple systems
from one pc using graphical interfaces instead of plain consoles. See page 178 how the systems
management model looks like.

A rules engine consolidates

many errors to more useful information and makes systems more
manageable. How this information is communicated from several systems to one has been
standardized; most well know is the Simple Network Management Protocol (SNMP). Devices of
different vendor
s can now be management by one management tool. It is most effective in
management routers and switches, but less effective at managing system software and applications.

Collecting performance information in a system requires an agent which runs in the sys
tem being
measured, and contains configurable ‘hooks’ or ‘probes’ that collect the required information. The
manager itself (with the reporting tools) is often on a different system and aggregates information
from multiple systems.

In the olden days perfor
mance measurement tools were written manually but a lot is available now
off
-
the
-
shelf. Current attention is aimed at self
-
management (self
-
correcting).

16


Guidelines in putting it all together:

-

Scope: meaning the amount of the environment over which the mana
gement function apply.
You cannot control everything anymore. You can think of horizontal “slices” of the
environment. Take middleware as an example, this has to work in its entirety and Operations
staff should monitor this slice carefully.

-

Automation: hig
h level of automation is essential to reduce costs and improve quality.

-

End
-
to
-
end service provision: vertical management. There are many components but they
never exist in isolation; collectively they deliver services to users. You should gain insight in
end
-
to
-
end service status.

-

Applications can contribute to systems management. Logs for example.

-

Enhancing systems management environment: use an evolutionary approach. Don’t throw
everything away but improve on it. You cannot be fully aware what the introd
uction of new
technology will do exactly.

Chapter 10: Security

Authentication = identifying users

Access control = authorization, giving users authority

Protection = stopping unauthorized access to resources

Security management = how to administer and repo
rt breaches in security

One tip in managing roles for users; keep it
simple;

merge roles if they have the same privileges for
example.

Three issues in early security design:

1.

How do you assign roles?

Often when changing roles and
privileges

people just give

them their logins or download the
data and mail it around.

2.

How is duplicate data protected?

In any organization data is replicated, how you keep this safe is a difficult issue.

3.

What strategy is there to guard against the enemy within?

Most frauds have ins
ide help. Let two or more people give permission for something, build
fast detection systems and assume violation is possible and build in recovery procedures.

The onion model is illustrated on page 190. Each concentric ring represents a protective screen
and
contains one or more resources. The rectangles represent access points and make a number of
services available to the outside ring/world. To get from the outside to inner services it must pass all
rings and thus multiple access points. The most secure

data should be in the inner circle, whereas
less secure data is in the outside circle (the ring that is easiest to breach).

Still, this doesn’t solve authentication

problems; breaking in authorization servers and the attacker
can assume any role he wants
. It is only as secure as the authorization mechanism is secure. Dividing
information on several locations helps; if someone gets administration rights for a web server but no
data can be found on it the damage can be low.

Also make sure assigning roles to

users is thoroughly protected.

17


Because of legacy software and different branches within the organizations the onion model is
actually a ‘boiling lava model’ (see page 193)., with several
little and bigger ‘onions’
and access points
being skipped!

Also man
y

access points require different passwords, and people will start writing
them down. The boiling lava model is much more difficult to control and enforce security policies.
Especially enterprise
-
wide data (e.g. product data and customer data) suffer probl
ems

for access in
separate organizational branches.

The onion model does not work well with the Web. Web services don’t trust each other’s
authorization/authentication very much and
many

have their own

security management
.

You can use security tokes (a pi
ece of data identifying the user)
on the Web
that act as a pass that
gives you access.

See page 197 for how it works. You can include a security token in a SOAP header
using the WS
-
Security standard.

A security token is supplied and validated by a security

token
service. All the services controlled by one security token service is called a security context.

Most websites use SSL (Secure Sockets Layer) to encrypt their sign
-
on process (see page 195 if you
want to know more about SSL).

There are two forms of
encryption:

1.

Asymmetrical

(or ‘public key’): different key to encrypt than to decrypt. They are slow but
you can publically publish your key and anyone can send you secure messages.

2.

Symmetric (or ‘private key’): same key for both, is fast. Well known is DES
.

On a completely different note: s
hould requesters trust service providers? Often they have no
choice. A timeout on security tokens can help with this problem.

To develop security it is often easiest to use a network diagram and draw the access points and

security context (read rings) in the diagram.

Especially with legacy software with their own security it
becomes problematic to achieve a single authentication.

Chapter 11: Application design and IT Architecture

Everything described in this chapter is su
pposed to be done before the application project is under
way; it is about IT planning.

At first people didn’t grasp the enormity of programs and just started coding. When they did they
started to program structurally and with waterfall development; which
ofcourse includes the
requirements
-
design
-
implementation structure. There are several problems with requirements; 1)
end users and business don’t know their requirements, 2) it is difficult to express the design in a way
that is understandable for both pro
grammers and business sponsors, 3) a division among
requirements, design, and implementation leads to over engineering (especially in large
organizations). Waterfall models don’t work
well with changing requirements.

Alternatively, you can use agile method
s that build iteratively and request feedback from end
-
users
often to elicit more detailed requirements. The Agile manifesto (from the Agile Alliance) states:

-

We value individuals and interactions over processes and tools

-

We value working software over com
prehensive documentation

18


-

We value customer collaboration over contract negotiation

-

We value responding to change over following a plan

Extreme Programming (XP) does a minimum of design and the only artifact that matters is the code;
changes are not really
anticipated up front. XP instead builds a large test library up front so changes
can be made without much worry.

These two schools of thought on design are referred to as “design up front” (planned) and “design as
needed” (Agile). The authors prefer a thi
rd approach (discussed later) that takes design in three
levels: business level, task level and transaction level.

MDA (Model Driver Architecture), currently gaining popularity, aims to develop the program directly
from the design.

Business rules determine

how facts (= data) are structurally defined and processed. There are five
rule patterns:

1.

Constraints (e.g. “Urgent order must be accepted if order value is less than 30 dollars”)

2.

List constraints (e.g. “User status is raised if he 1) spend more than 1000
dollars and 2) is a
member for over 12 months”)

3.

Classification (e.g. “order is urgent if the delivery must be done in less than 3 hours”)

4.

Computation (e.g. “ratio = price/earnings”)

5.

Enumeration (e.g. “customer standing can be gold, silver or bronze”)

There

is no mathematical theorem for business rules; much of it depends on the context.

Almost any application has some element of systems integration in it. You can leave legacy systems
like they are, make minimal changes or major changes depending on the sit
uation. Most
organizations make these decisions based on the age of the technology on which it is built.

Object
-
oriented programming aims to improve reuse.

A marketplace for reusable components never
really took off. Why?

Development must be broken down in

three roles (see page 212):

1.

The programmer who writes reusable components

2.

The assembler who writes scripts that call the components

3.

The administrator who writes the deployment descriptors and configures the production
system

In most cases the assembler c
annot find the right components he needs (because of performance
issues, it does too much, it does too little, etc.) thus the assembler will often become programmer as
well. You have to be very lucky to find exactly what you need (the authors in the book c
all it
serendipitous reuse, to be translated as ‘fortunate reuse’).

An alternative to serendipitous reuse is architectural reuse; a top
-
down approach instead of bottom
-
up. You enforce reuse by providing only a limited set of reusable components (one error
-
reporting
mechanism for example).

19


Many silos are considered bad, but the alternative is bad as well: one monolithic application no
-
one
understands.

We seem to be bad at looking over a large number of requirements and splitting them
into logical chunks.

An

enterprise architecture should focus on the problems faced by the organization and outline a
solution. Organizations generally look for 1) faster development 2) cost reduction and 3) better
security, reliability and performance. The architecture should

v
ery simply
-

define how this should be
achieved.

As with any large design process, many different people design different parts and often upper
designs have to be revised because something isn’t possible for appropriate cost.

The top layer of an IT applicat
ion would be the business process level. The second level is the task
level (see them as dialogues between users and the system). At the bottom level there is design and
implementation called the transaction level.

In RUP (Ration Unified Process) it starts

with Inception; defining the project boundaries. However the
authors of the book suggest you need to does business process level design first to know these
boundaries. They also suggest eXtreme Programmers to do business process level design first,
despit
e their notion of ‘design as needed’. At task level the complex tasks should be written by the
most experienced programmers and discussed with the group. See page 222 for a table describing
the details on these levels. According to the book’s authors, this

design
-
in
-
level approach is to be
preferred over planned and agile methods.

Chapter 12: Implementing business processes

Until recently functional analysis was much more common than process analysis. A simple example
would be a car rental process. After so
me time a gold
-
card is introduced that speeds up the process.
Because the previous system was not
written

in smaller
sub processes

a new application is developed
for the gold
-
card, without reusing anything. Now there are two applications fulfilling one bus
iness
process: car rental.

Everything a business does can be described in processes. Some processes are however more
defined than others. A process is a series of activities that delivers something. Activities themselves
can be processes; wiring a house is

an activity of the process building the house, but wiring a house is
a process in itself. The lowest level of detail that cannot be subdivided is called a task and is usually
done by one person in one place at one time and requires some skill or resource.

Many processes trigger other processes what the book calls “send and forget”.

A prescriptive approach is fast and repeatable, they are documented in diagrammatically or in a
series of steps.

An alternative is a plan and have a process for converting the p
lan into a one
-
off process definition.

To summarize:

-

A process delivers something, usually some goods or a service

-

A process follows a plan that defines the order of the activities

20


-

Activities can sometimes be processed in parallel

-

Activities can be conditi
onal, that is, process plans can have path choices

-

An activity can be a process in its own right

-

A process can start another process (send and forget)

-

A process may be ongoing, meaning loops back on itself

-

Two extremes of plans: very prescriptive or a plan

that defines rules that must be obeyed

-

In practice process execution may deviate from the plan

-

In practice many companies have processes going on at any one time and are likely to be
competing for resources

Information outside their process context has li
ttle meaning. Information falls into one of four
categories (see page 230):

1.

Plan objects; information about process plans

2.

Tracking objects; information about where an individual process has got to in its execution

3.

Resource objects; information needed by th
e process

4.

Result objects; information that is a process deliverable

Often tracking objects are combined with result objects (e.g.
a half finished order is completed,
making a tracking object the result object).

There are four patterns to architecture proce
sses:

1.

Single centralized application (see page 231 for diagram)

Several processes with one application and one database

2.

Tracking multiple centralized application (see page 231 for diagram)

Several processes with multiple applications, but each application
would handle all
processes. Advantages of (1) and (2) are they only have to save data in one database and can
be thoroughly secured. Disadvantage it relies on the network (although often this is not a
problem)

3.

Pass through (see page 232 for diagram)

Each a
pplication does its job and passes the data to the next application. Advantages: each
app can have its own technology and database can be restructured in any way you want.
Disadvantage is it has lots of timing dependencies.

4.

Copy out/copy in (see page 233 f
or diagram)

Starts as Pass through but at the end of the app’s job the data is sent back to a central
database. Advantages are similar to Pass through + it has a centralized view of what is
happening to the processes. Disadvantage is it is more complicate
d.

Normally you start clarification (part of design process) by drawing process implementation diagrams
(see page 234 for diagram). Guidelines in doing this:

1.

Don’t go in more details than the task

2.

If several tasks are closely related then group them in a b
ox

3.

At database level only mention the major tracking and resource objects; don’t mention
everything

4.

Represent a batch step as a single application

21


5.

Don’t mention any technology

Once process
-
level design is clarified the next step is to analyze it which has
sevel areas for analysis:

1.

Performance

2.

Resilience

3.

Error handling: you can do three things when something goes wrong 1) fix what needs to be
fixed and try again, 2) leave the whole problem for manual reconciliation later, and 3) revert
to another process (e.
g. manual)

4.

Data accuracy

5.

Timing constraints: when you look at process steps determine what dictates the transition.
Some process steps have a limited timeframe for their execution or data transfer.

6.

Migration

7.

Flexibility: process level design lays the requ
irements of what users will do, the process
-
level
design team may not have the authority to make all the changes in the process.

Process level design helps in many ways, such as:

1.

Provides tool for improving quality of data

2.

Provides fault lines where system

will change

3.

Defines message flows between processes are deferrable or real time

4.

Provides rationale for resiliency requirements

5.

Provides rationale for performance criteria

6.

Provides underlying basis for the discussion about security

7.

Provides underlying basi
s for discussion about data distribution

On page 239 there is an extra bit that explains the difference between functions and processes. Boils
down to that most organizations have taken a strong functional view and often works, but is also the
cause why IT

builds lots of silo apps. Functional approach is departmental driven; not organization
-
wide and only want to be responsible for as little as possible. Also functional approach is in
insufficient approach to analyze how to change the business.

Chapter 13:
Integration design

Integration design is a major
(but relatively short)
element of task
-
level design

which ensures the
solution hangs together as an integrated whole
. Integration design is the design of protocols
between app and app, and app and end
-
user.

The integration design group needs to know the nonfunctional requirements, such as critical
performance goals, recovery time goals, and special security requirements.

When designing the database you should not just look at the task but take a bigger persp
ective; look
at the business rules as well.


The output from integration design is not program design or component design but simply a
description of input and output messages and any session state needed to control the protocols.

22


A security session should

outlast all the task session a user performs (see page 243 for a diagram).
The sequence of tasks should be ACID (although perhaps a bit less strict). When a user is working on
an tracking object for instance, he should have exclusive rights. This could ca
use very long
tr
ansactions which is undesirable for performance reasons.
Timing or p
seudo locks partly solve this
problem

(=
not creating one big lock but short smaller ones, a full rollback of the entire transaction
becomes difficult in that case
)
. An alt
ernative is copy the data locally (copy out/copy in pattern).

Task/message diagram = diagram
contain
ing

actors, messages, application processing and objects

(see page 246 for an example)
.
It displays the flow of data.
You should only model data if it is
im
portant to follow
-
up analysis and for clarification of the task.

The design process consist of: understanding requirements, brainstorming solutions, clarifying one or
more chosen solutions, and analysis.

Analysis using integration design can be broken down

into eight areas:

1.

Scalability and performance
: take data volumes for each task and calculate the number of
messages

2.

End
-
to
-
end data integrity
: take task/message diagram and methodically trace through the
flow and step by step check if anything can go wron
g

3.

Security
: what roles are associated with each message?

4.

System management
: assess configuration/version control and monitoring; how to get
notified in time and how to fix issues?

5.

Enterprise data consistency
: Check data duplication in other systems (and if

that’s bad), plus
how much do you depend on data from other apps and is this accurate and complete?

6.

Ease of implementation
: does your organization have the experience for this?

7.

Flexibility for change
: think of how the system might need to change in the fu
ture

8.

IT strategy compliance
: does the technology fit the chosen strategy?

A good integration design is loosely coupled. Tightly coupled applications are characterized by many
short, real
-
time messages and complex session state. Loosely coupled are large me
ssages and no
session state. From tight to
lose

coupling
has gradations (see page 252). Loosely coupled is more
resilient than tightly coupled.

Loosely coupled systems require more work to synchronize the database and reading local copies of
the database.
Tightly coupled require more testing but probably less code.

You can also add tiers to integration design (see page 253) and the author’s argument lengthy (and
not understandably) that semi
-
coupled systems would be best. See page 252 why, I couldn’t follo
w.

Chapter 14: Information access and information accuracy

Database design occurs during task
-
level design.

This chapter discusses four aspects of database design: 1) information access, 2) information
accuracy, 3) shared data or controlled redundancy, and

4) integration with existing databases.

Information access

23


See for the information access diagram page 258.

There is no simple solution when accessing information. Depending on how many objects you want
to access and performance issues you can adopt diffe
rent strategies.

Generally there are requirements to reformat the data to be more understandable, but people want
to see the raw data as well.
Denormalization of data means codes such as “LHR” are converted into
“London Heathrow” by joining lookup tables w
ith data tables.

But again the manager might need to
look up the abbreviation used for the production department.

When building a data mart (which is smaller and cheaper than a

enterprise
-

data warehouse
) you
should build using data from the
data warehous
e

whenever possible.

Most reports used to be printed during the morning are now available online in business systems.
These reports are typically built using SQL queries processed directly in the database. Data marts
help in these but do not replace the
ol
die

reports entirely.

To achieve process improvement (e.g. speed at which an order is fulfilled) you should log timestamps
and such. However while designing such systems we often forget these.

When dealing with customers (e.g. they want to check the status

of their order) they require up
-
to
-
date information. A data mart wouldn’t suffice; these are based on older data using Copy out/Copy
in.

Information accuracy

Reasons why data is inaccurate:

-

It is out of date

-

Wrong conclusions are drawn from the data

-

Infor
mation is duplicated (and you don’t know which is the correct one)

-

Information was input incorrectly

Often IT applications are part of the problem, and the requirements users have of them; e.g. they
want to spell the name directly in the order form instead

of looking them up first.

In almost any organization there is data duplication somewhere. Merging them is difficult because of:

1.

How do you know you are talking about the same object?

2.

How do you know whether an attribute is the same?

3.

If the data for one at
tribute of the same object is different in two databases, what then?

The last one is always unsolvable.

Shared data or controlled duplication

See page 267 for a diagram of both these solutions.

Shared data implies a common database, typically separate from

other databases. Disadvantages: 1)
poor performance, 2) some remote database access do not support two
-
phase
-
commits, and 3)
24


structure of shared database is hard coded, making changing schemas difficult. To get around these
problems see page 268.

Embedded

call on s
hared data component = a component interface is put
above the shared database
so external systems no longer communicate with the database directly; thus
changing the database
itself is less of a problem. This works best if data is used on many pl
aces.

Front
-
end call

=

similar to embedded call but the client machine now also has an extra interface layer
that communicates with the shared database’s abstraction layer. This is even more loosely coupled.
This is best used if the data is used on fewer p
laces.

Controlled duplication has the same data in two or more databases. Advantages are better resiliency
and better performance; disadvantages data is slightly out of data and the app is more complex.

Getting the data exactly the same in both databases c
an be a bit
troublesome
;

think of java multiple
thread synchronization problems and you get the idea.


Two
-
phase commits are not ideal because if
one system would go down the other would be badly affected. You can have one database do all the
updates and o
nly after that update the second one; the second database would be delayed. As with
shared data, the interfaces of the databases should be exactly the same so you can move and change
machines easily.

You can even make a hybrid version of controlled duplica
tion and shared databases.

Creating consistency in existing databases

Say you want a silo app share its data, there are three problems:

1.

Technical: convert old program to use new component interface or equivalent

Often creating component interfaces is not e
asy on old technology.

2.

Data migration problem: move data from old database to new one that may be formatted
differently
. This could mean lots of manual labor because we don’t really trust AI for this.

3.

Business process problem: change business process to us
e new data
. No matter the
technology, data will always remain an business responsibility.

When the database supports ODBC and OLE DB interfaces and facilities this change can be done very
easily.

An information broker pattern (when one
system

updates
his d
ata and

broa
dcasts this

update to the
other systems
) has
data integrity issues (when both systems broadcast updates at the same time on
the same data) and the killer problem is often there is no equal object identifier for all the data. So
they don’t know
how to translate these updates.

The information controller

The authors of the book introduce the title information controller for the person whose responsible
for accuracy and quality of all data in the organization. Data quality is best fixed when it is i
nputted;
thus the information controller has a very visible job. He should understand what is meant with data
access and transactions.

25


Chapter 15: Changing and integrating applications

Little has been written how to change legacy applications into a new si
tuation. The previous chapter
discussed how to migrate the data. The book gives an example which chops the before and after
situations in several smaller projects and prioritizes them. Doing everything all at once would take
too much time before results ar
e visible.


Use a high
-
level business process model of the system and map from that the current IT applications.
From this information identify the necessary shared data and the integration dependencies (real
-
time and deferrable) between the applications.

Usually most changes for before
-
to
-
new situations are data consolidation and adding a presentation
layer channel.

To build the presentation layer there are two options; 1) leave the old (green
-
screen) interface and
build a new one on top (also called scree
n scraping), and 2) a transaction service interface (see page
280).

Screen scraping must do the following:

-

Log on to the back
-
end system

-

Do any menu traversal required

-

Reformat the data into a string of text that precisely conforms to the format expected b
y the
existing application

-

Take out the fields from exactly the right place in the received message

-

Keep track of the position in the dialogue of many end users simultaneously

On top of that you need to take care of error handling. If you have to apply scr
een
-
scraping on ten
screens it’s a lot of work. A lot of web messages might correspond to loads of smaller green
-
screen
messages and
for each of those an error could have occurred; and you’ll have to rollback (two
-
phase
-
commit). See also page 282.

Thus; a
transaction service interface is the preferred choice. You need to do three thing to make this
happen:

1.

Change existing production code: most presentation and security code can be removed
because this will now be at the presentation layer. In many cases the

old code is not
considered the basis for reading and writing data but some of this logic ends up in the
presentation layer as well; e.g. a new customer is stored locally who has not been validated
is used for a new order. This ordering dependency is a rea
son to store state (which you don’t
want), as well as security and temporary data gathering. You should solve this as good as
possible by sending everything in one large message to the database, or store the state in the
database.

2.

Wrap old technology with
new: you create a new interface for an application by installing
some software that converts the old interface to the new interface. This can be easy or very
hard;

you’ll know
only
when a programmer tries. If it is too difficult you might still need to go
for a new application.

Most important here is all transactions need to be stateless; all the
baggage that comes from a ter
minal interface must be eliminated.

26


3.

Look at the impact on the business processes
. One option is the
core server
model
:
b
uilding

your p
rocesses very flexible, putting all core processes in a transaction server, would require
all states be measured in the object (how far it is in the process). Advantages: 1) core server
itself is stateless, 2) process rules are implemented in one place, 3)

not difficult to mix
presentation modes (e.g. web or telephone call), and 4) different core activities can be
physically located on different servers. The alternative is the
reuse model
: each presentation
layer refinement is implemented as a separate appl
ication instead of calling core processes
these are incorporated in the application. Advantages: 1) fast (data stored locally), 2)
processes are easier to change. But disadvantages: 1) less consistency insurance, 2)
impossible to mix presentation modes, an
d 3) hard to reuse components and modules.

To change from (old) file transfer applications to message queuing includes training, installation,
configuration, development of operating procedures, and program changes to existing applications.
Benefits of mes
sage queuing over file transfer are responsiveness and integrity.

Similarly, RPC to transactional component middleware has long
-
term benefits but not a quick return
on investment.

Batch runs are still very common. These days many banks need to calculate re
al
-
time balances and
run their batch processes at night that do the actual debit/credits on the accounts. Because of 24
-
hours society the batch runs need to be shortened more and more. There are four ways to do this:

1.

Shorten the administration (discussed i
n chapter 7 which covers resiliency)

2.

Shorten the batch process; usually this means running processes in parallel

3.

Run batch alongside the online processing; as long as the transactions are short this can be
done
. The weird thing is the batch would then go a
longside the presentation layer element
(see page 294, second diagram), and fight alongside them for the transaction server.

4.

When for batch process the database needs to be frozen (e.g. reporting functions); replace
the batch program with online programs w
ho does things differently (see page 294
-
> don’t
really understood this one)

In many ways running a batch is like running a giant transaction.

Chapter 16: Building an IT Architecture

An architectural model should be a guide for implementation, and modifi
ed when necessary. It is not
something to be put on a shelf. It includes function and nonfunctional requirements (resiliency,
performance, manageability, and security of the resulting system).

Chapter 16 includes several case studies which are not summari
zed in this document. Check’m out
for yourself to get some more feelings for practical applications.

Section 16.3 sums the main points of
the book, only a few pages, read it! (if you made it so far).

Also a last word of note, each chapter has its own
small

summary. These summaries deviate
sometimes from the actual chapter, a better title of them would be: “What we think you should
remember”. They are also a good read!