Improving Incident Management Utilizing a Federated CMDB

economickiteInternet and Web Development

Oct 21, 2013 (3 years and 9 months ago)

85 views

Improving Incident Management Utilizing a Federated CMDB

By Marc Grosz

Figure
1
. Management of IT Complexity

(2)

Introduction


It’s no secret that the world is becoming ever more reliant on IT systems. Along with this
increase in reliance is the virtually exponential increase in complexity. As a result, even the best
IT engineers are incapable of
holistically
understanding

the systems they are responsible for.
Despite the best standards and processes, incidents are inevitable. These incidents can
potentially cost businesses
millions of dollars

as well as

reput
ation.

(1)

Thus, a s
olution needs to
be developed that will aide in more quickly identifying the true impact of the incident and which
engi
neers should be

involved in resolving the iss
ue. T
he current system
s

are modified asset
management sys
tems that are inflexible.


T
he sol
ution outlined in this paper is in favor of a
virtual schema solution that doesn’t store any information and
solely

relates current systems.

Problem Statement

Referencing

Figure 1, it
is apparent that there is a large
gap that will only widen as time
goes on between the
complexities of IT systems
and
the ability of those entrusted
with managing the IT systems.
This gap results in longer time
to resolutions
when incidents
occur which in turn develops
into productivity, money, and
reputation

losses
.

(2)


As an example,
companies have been
integrating their financial
systems into large complicated
ERP systems over the years. A
consequence of this integration,
which helps boost the productivity of the company as a whole, is that if anything happens to
even
a part of this

system, the relative impact is far reaching. All of sudden customer service can’t
look up customer information to service calls, accounting personnel are no longer able to service
orders, and tightly integrated applications like on
-
d
emand p
urchasing now stop working. All of
these scenarios directly relate to customer satisfaction and the longer the service is down the
worse the company looks and the more money that will be lost. Currently, if there is any
monitoring, engineers will be aler
ted to symptoms of the problem. The ERP admins may be
alerted that their applications are no longer communicating properly, network engineers may be
notified that a switch went
down; a

server admin may see a failover of servers occur, or worst of
all the
IT personnel may receive a direct call from the user proclaiming the problem. After the
engineer is aware of the incident, a process is followed to document the incident and begin
collecting relevant information to the incident. As more information is co
llected, other engineers
that are most likely not aware of the incident are notified. Collecting the information alone can
take a long time depending upon how document the IT environment is and how up
-
to
-
date that
information is
. Also, the engineer only
has a view into the information related to their area; a
Improving Incident Management Utilizing a Federated CMDB

By Marc Grosz

network engineer has no clue about the configuration of the ERP
application

and what it depends
on to function properly. As a result, there is more time wasted by trying to collect all of the
indivi
duals together from each piece of the service that is being affected. Once everyone is
together, now knowledge sharing begins to start understanding the true impact from a holistic
stand point. Finally when everyone
understands

the system, the incident c
an be resolved and the
service restored for the customer. This process can sometimes take an entire work day; losing
the company a lot of money and productivity.

While this is an extreme case, and many services don’t require involving so many parties,
t
his problem of collecting information, finding all parties to help resolve the incident, and
understanding the larger IT environment viewpoint is the same every time. It is est
imated that
up to 87
% of all incidents come from recent changes that have gone
wrong.

(3)

By simply
having quicker insight into the IT environment and how it is changing can save a business
hundreds of thousands of dollars per year as well as help keep their reputation to the customer
high.

Current
Proposed Solution


CMDB, or Configuration Management Database, is an invention of the
ITILv2
specification. The problem that was trying to be solved was information being scattered among
many

unshared

informational silos all around the organization.

(4)

This exacerbated incident
management efforts, but only recently to
a per incident seeable
level. The CMDB would be a
singular database that would store many
auto
-
discovered
Configuration Items, or CIs, that would
reflect the c
onfiguration of the IT environment.
Non
-
discoverable CIs would be obtained thru
federation.

(5)


There are several issues wi
th this approach however. The first is
dependence on
auto
-
discovered information. While this is great to
help
ensure you are aware of every CI in the IT
environment, this
should only

be used
for auditing purposes. The CMDB needs to be the
“approved” state of the IT environment. This
means there is a trail

of
change tickets that can
prove
the current state represented in the CMDB is valid.

(2)

The auto
-
discovering, or auditing,
functionality violates this “approved” state enforcement and can present views to users that are
unde
sired. Ironically, this feature is usually the most touted feature of current CMDB offerings
in the market.


The second problem with

this solution is that many implement federation as a way of just
retrieving information from other systems and copying it
to the singular CMDB.
This introduces
the
chance of stale information

and possibly incorrect relations being
formed
. This can cause
engineers to rely on incorrect information, wasting time in solving issues and causing the
company to lose money and reput
ation.


The third problem is CMDB solutions


over
-
rel
iance

on
the auto
-
discovery
feature
. In
most cases, CMDB administrators
must

write numerous adaptors to retrieve CI data

from all of
their devices
. This singular problem is the most expensive to overco
me in large organization
s
,
because now the CMDB administrator doesn’t just have to know how to write adaptors for
federating with third
-
party management systems which is difficult enough, they must thoroughly
understand every device and every piece of soft
ware within the IT environment.

(6)

As a result,
these
adaptors

are often
left undone

to avoid undue cost
resulting in

less information handled by
the CMDB that engineers depend on. Engineers then avoid the CMDB, choos
ing

to stay with
their third
-
party management system that has all the information they require to do their job.

Improving Incident Management Utilizing a Federated CMDB

By Marc Grosz


The final major issue with this solution is flexibility. Most CMDB solutions allow for
data schemas to be expanded
; however,

these solutions
don’
t allow for logic modifications
.

(6)

For example, there are fundamental differences between the router configuration architectures
for Cisco an
d Juniper. M
ost solutions
design

their schema and logic to devices and applications

more commonly found

in IT environments
. While this may seem like a good business decision,
it very limiting design decision. The amount of skill, time, and money it would take to modify
the CMDB to what is needed is simply too costly; not to mention put

the system into a
configuration that would no long
er be supported by the vendor
.

Related Works

Ontology


No matter how

the CMDB gets implemented, ontological methodologies must be used.
The whole point of the CMDB is to solve the problem of not having a holistic view of the IT
environment. In order to create this holistic view, there must be an agreed upon information
d
omain and definitions for informational attributes within the domain. There are numerous
systems that exist and how each one is implemented is usually different. Going back to the
Cisco and Juniper router, the ontological component for how to define exac
tly what a router is
needs to be carefully analyzed or else either the CMDB won’t be able to retrieve information an
engineer will depend on or even worse create relations that misdirect an engineer to incorrect
areas during incident management.

There is
also differences among asset, HR, help ticket,
change request, etc. management systems. To add to this problem, what if there are multiple
systems of the same type but implemented in different fashions. Ontology helps in designing the
CMDB system schema
that encompasses the entire environment; how individual entities are
defined and how is entity relationships defined. Without using ontology, there can be no
solution to
this
problem.

Semantic Web


The Semantic Web is a new way of looking at
the web, how
it is used by machines and
people, and how to retrieve information to inquiries of a more general nature. This methodology
allows for one system to be able to ask another system for specific information it needs to answer
an inquiry posed to it from eithe
r another system or an actual user
, but in a scalable manner
.
While this paper is not addressing the inter
-
querying of web systems, but rather management
systems, the concept is very much relevant.

Also, while this concept is not in the current
solution,

it needs to be considered for long term

viability of the CMDB solution. This
is because
manage
ment systems will eventually
not only
be
internal to a company, but
also external
; e.g.
OS bug lists

on Internet
.


While adaptors must be created in the interim to handle the querying of third
-
party
management systems, eventually a standard will have to be established to interact. There will
always be systems that won’t conform; that is a fact of nature when
there are

so many systems
that need to be integrated that come from so many different IT domains. It is not feasible to
believe that a standard integration language could be defined at the device or application level.
However, it is feasible to believe a standard

can be developed at the management system level.
This allows for each IT domain to determine how their system will be defined and simply
provide a façade
interface to interact utilizing

a standardized protocol language.

Improving Incident Management Utilizing a Federated CMDB

By Marc Grosz

CMDBf Standard Specification


In
order to help in creating a workable CMDB solution,
numerous

vendors formed the
CMDB Federation group
, CMDBf
. This was a result of ITILv3 redefining the CMDB to a CMS,
configuration management system, which is a federated CMDB system. The problem that ca
me
about with this
re
-
defining of the solution from a singular CMDB to a federated CMDB system
was how
the CMDB queries

the existing management systems
.

(7)

The CMDBf, after
3
years of
discussion, came up with a standard
protocol for communication between the CMDB system and
third
-
party management systems.

(6)

This protocol gives the ability to retrieve information from
systems as well as those systems to register to the CMDB and make it aware

of new information.
The specification is XML based and relies on XSD definitions.

(8)

This allows for extensibility
of ontological definitions for the systems that are being integrated. The CMDBf query

defines
three parts:

which CI attributes are desired to be retrieved, a filter on which data should be
returned, and the conditional of which relationships the CI attributes take part in.

(8)

For
example, we want to know the names of the senior D
BAs for financial databases within the
western region. The CI attribute would be the administrator names, the filter would be for senior
DBAs, and the relationship condition would be for databases of the financial type and
administrator location of wester
n region.


Virtual Schema Solution

System Description

The proposed solution is to create a system using the mediator methodology. It is
important to state that while this system acts as a singular interface to all the integrated
Management Data Repository,
MDR
,

systems, it does not supplant the
d
irect use
of

individual
MDR systems

by users
. This system is to solve the problem of identifying CI relationships in a
dynamic and flexible fashion. This is done by encompassing all of t
he logic that is required to
collect and relate the data that exists wi
thin the separate MDRs. Figure 2

shows the
implementation structure
of the Central CMDB
System.

There are six
major parts to this
system: User Interface,
Query Processor, Query
Translator, Virtual
Schema, MDR Query
Processor, and
an
adaptor
for each integrated MDR.

The User Interface
module is the Façade
users
and
third
-
party
applications in
teract
with
.
The queries that get posed
to the system are
reflective of what the
current virtual schema
looks like; however, if those queries do not reflect the current schema, an error will be presented

Figure
2
. Proposed Solution Arch
itecture

Improving Incident Management Utilizing a Federated CMDB

By Marc Grosz

to the user or application. This would be similar
to when an invalid query is
sent

to any current
DBMS. This module also is responsible for
managing

query sessions initiated to the system. As
a result, there should only be one instance of this module and act as a controller
.
Also, the User
Interface
module is responsible for the possible translation of the reply into the appropria
te data
format; XML, CSV, etc.

The Query Processor is responsible for k
eeping track of queries passed to it by the User
Interface. This would most likely be implemented by s
pecifying a certain number of instances to
be started to handle query processing. When the query is processed, it is passed to the Query
Translator. The Query Translator will either reply with a set of queries to run and specify which
MDR the correspondi
ng query should be ran against or that the query was unable to be
translated. If the query could not be translated, the Query Processor will respond to the User
Interface specifying the error that occurred and the corresponding thread handling the query w
ill
be marked as free to process another query. If the query was successfully translated, each query
will be sent to an instance of the MDR Query Processor module. If any of the queries result in
an error, then the Query Processor will
reply to the User
Interface with the appropriate error and
the thread will be marked as free for a new query. If all of the queries come back successfully,
they will be compiled as temporary virtual tables; similar in nature to basic database view
s
. The
query being proces
sed, the original one passed to the Query Processor, is then ran against those
temporary virtual tables to retrieve the end result set. This result set is th
en returned

to the User
Interface module.

The Query Translator is responsible for translating the
original query posed to the system
into the individual queries that will posed to the
MDR Query Processor. This is where the most
complicated code will end up residing. It starts by retrieving the data from the Virtual Schema
Module’s Data Schema. After

the relevant information is retrieved, it identifies the tables and
attributes

information that must be retrieved from the Virtual Schema Module’s Location
Schema. When processing the location schema information, the Query Translator can then
identify wh
ich MDR contains which attributes and optimally break up the original query.
Exactly how this process occurs is beyond the scope of this paper. Once the original query is
successfully broken into MDR specific queries, the resulting MDR
-
to
-
query set is se
nt back to
the Query Processor. If the translation is not capable of being broken
, then an error is sent back
to the Query Processor.

The Virtual Schema contains the ontological super schema definition of the entire
environment that encompasses all of the

individual MDRs that are being integrated. This Virtual
schema is completely defined by the
CMDB
administrator.

There are two sub
-
modules to this
module: Data Schema and Location Schema. The Data Schema is virtually identical to a real
database schema.

The only difference is that a real database schema is capable of storing
data;

this Data Schema is purely an ontological representation of the entire environment. The
Location Schema stores the mapping of attributes defined in the Data Schema to the rea
l location
of the data the respective Data Schema attribute represents.

The MDR Query Processor module can have many instances of its
elf running at the same
time. After

the Query Processor relays on the MDR
-
to
-
query pair information, the MDR Query
Process
or
instantiates the Adaptor modules that corresponds to the correct MDR and runs the
query against that Adaptor module instance. The result of that query, whether a data set or an
error, is then returned to the calling Query Processor instance.

The Adaptor module is part adaptor and part proxy
that

is a self
-
running daemon with
multiple threads. The number of threads could be customized to either be optimized or limited to
Improving Incident Management Utilizing a Federated CMDB

By Marc Grosz

a number allowable by the respective MDR. The Adaptor is purely
customized code. The code
can be provided by the MDR vendor or created by the CMDB administrator; regardless, it
translates the MDR’s API into an interface the CMDB system is expecting. The
re will always be
one Adaptor mo
dule created per MDR that is inte
grated.

Advantages

The main advantage this solution delivers is the resulting
improvement

in which
engineers tasked with resolving an incident understand the related environment. Going back to
the previous ERP incident example, there was a lot of time was
ted getting the correct people
involved and understanding the impact. With this system, as soon as an engineer, from any
team, is alerted to an issue, the engineer can simply query this Virtual Schema CMDB. There is
now a complete view of what is or coul
d be impacted
as a result of the current al
erts and who
should be involved.

Thru monitoring integration, alerts from multiple monitoring systems could
be automatically correlated by utilizing information federated by this solution. Then the correct
ERP a
dmins, network engineers, and system engineers would all be notified. Also, along with
the notification, those individuals would be presented the relevant information from the federated
CMDB. Although there is most assuredly running state information the

engineers would not be
presented along with the notification. The overall incident management process has just been
sped up dramatically

getting the engineers to resolving the issue rather than wasting time
researching
.

Another

advantage of the Virtual S
chema solution is that it
reflect
s the fact that there is
no way to define an ontological super schema for an IT environment made up of any number of
MDRs from an unknown set of vendors. It is relatively safe to assume that although a standard
communicati
on protocol could be developed, it still would be completely incapable of defining
any ontological definition for systems that would take part in this solution. As such
,

this solution
addresses the need for flexibility of definitions. This means there is

no need for vendors to
modify their system
s

and the customer know
s

it will work in their existing environment. The
only issue is to perform the tasks of creating the MDR
adaptors
and defining the Virtual Schema.


Finally, if there is integrated business
logic MDR, the i
ncident prioritization

can now take
place automatically. This results in engineers working on the most important incidents from a
business impact point of view. Also, management can now have confidence in knowing they no
longer have to ch
eck in with their engineers
to ensure
the proper weight is given to each incident.

Disadvantages


While certain efficiencies could be obtained thru algorithmic enhancements, there is large
potential for scalability issues. There is very complex code invol
ved in breaking a query to the
User Interface into the queries that will be sent to the MDRs.
Also, this solution is designed to
be daisy
-
chained together so one Virtual Schema CMDB could be a MDR from the viewpoint of
another Virtual Schema CMDB.


The se
tup of the Virtual Schema is entirely customized by the customer during each
implementation. Also, each adaptor would have to be customized before an MDR could be
integrated. As a result, the total setup required to use this solution could prove to be a
larger
expense than many would be willing to take on. However, part of the beauty of this system is
that if someone builds an adaptor for one system, there is no reason it can’t be used by another
customer after some s
mall mo
dification effort.

Also, lik
e current CMDB systems, there is no
reason the CMDB vendor offering this solution can’t provide a starter Virtual Schema.

This is
Improving Incident Management Utilizing a Federated CMDB

By Marc Grosz

effectively what the vendor is required to do now anyway; however, now the cus
tomer would be
able to modify the schema to ref
lect their environment
.


Finally, this solution lack
s
standards when it comes to Virtual Schema implementation.
This solution touts flexibility in its current form, but if every implementation is different, the
solution will fail to be adopted. Vendors w
ill be unwilling to support any such solution as it
would drive costs up too high; as
it’s

effectively supporting as many products as
implementations. Even if an administrator of one implementation knows and understands the
system perfectly, when that adm
inistrator goes to help out with another implementation they
would effectively have to relearn everything. Essentially there are high support and mainte
nance
costs if no standards are developed
.



Future Works


There are standardizations as well as best practices that can be developed. First, the Data
Schema definition can be standardized. This would allow for vendors to then release their own
definitions including entity and relationship definitions that could

be imported into the Virtual
Schema, thus saving on setup time. The reason for this is that many items, like a Cisco module,
are only capable of being related to a limited number of other devices; however, other items, like
a database, need to allow for
more flexible types of relationships.


Another future work is creating a standard development language for the adaptor
modules. With a strict development environment designed for these modules, any potential
performance issues should be able to be dismi
ssed.

Also, this will help in making these adaptors
more easily shared between customers.


The Query Translator has large potential to hinder the adoption of this solution due to its

potential scalability issues; especially if multiple of these Virtual Sc
hema CMDBs are chained
together. There will need too much work performed in algorithmic efficiencies to ensure this
solution remains a viable CMDB solution.

The
virtual schema ontology could be auto
-
discovered by
scanning all of the MDRs that
are being in
tegrated
. Information would be collected
within each MDR

to define entities and
relationships. As a result
, it could be possible to save administrators a lot of setup time by
discovering the Data Schema and Location Schema definitions.

Thru the developm
ent of
standards and best practices, auto
-
discovering would be able to be more easily realized.

F
inally, there is large potential for
applying this system for other management needs
.
Since this system has the capability of performing impact analysis, it c
an be integral to the
change man
agement process. With nearly 9
0% of incidents caused by changes, being able to
see how a change will affect the existing IT environment could have dramatic potential to
business; saving an organization hundreds of
thousands of dollars per year and improving the
overall reliably which would boost reputation and customer loyalty.

(3)

Conclusion

The need for on
-
the
-
fly impact analysis is essential as the IT industry only becomes more
and mo
re complex. While this solution is not earth
-
shattering, it utilizes existing works from
other applications, combining them into a solution that can have large benefits in today’s IT
environments. Solutions currently on the market approach this problem i
n a way that alienates
many MDR vendors and hinder the customer’s ability to fully understand their IT environment

at
any ins
tant in time
. The proposed solution outlined in this paper helps address the issues with
current offerings. Also, it was shown ho
w this solution more completely addresses the actual
Improving Incident Management Utilizing a Federated CMDB

By Marc Grosz

problem by possessing greater flexibility in implementation. While there are still issues to be
addressed
, this Virtual Schema solution holds great potential to be the
foundation for
on
-
the
-
fly
impact a
nalysis in environment with non
-
standardized MDRs
, helping confront each ITIL
discipline
.

Bibliography

1.
Decoufle, Bob.

What is the cost of your downtime?
Datacenter Journal.
[Online] 9 3, 2009.
http://datacenterjournal.com/index.php?option=com_content&view=article&id=3148:what
-
is
-
the
-
cost
-
of
-
your
-
downtime&catid=38&Itemid=43.

2.
O'Donnell, Glenn and Casanova, Carlos.

The CMDB Imperative.
Boston, MA

: Prentice Hall, 2009.

3.
Cooper, Larry.

A CMDB R
uns Through IT. 2007, Vol. 3, 25.

4. The CMDB: Relief for Your IT Headaches.
Viewpoint.
2005.

5. Achieve ITIL Standards and Superior Application Lifecycle Management with a CMDB. s.l.

: Aldon,
2008.

6.
England, Rob.

The IT Skeptic Looks at CMDB.
Porirua

:
Two Hills, 2009.

7. From silos to services.
From silos to services: using CMDB federation to enable an ITIL Version 3
configuration management system.
5 2008.

8.
Configuration Management Database Federation Specification.
[PDF] s.l.

: DMTF, 2009. DSP0252.