Log Analysis Whitepaper

snortfearΔιακομιστές

4 Δεκ 2013 (πριν από 3 χρόνια και 8 μήνες)

91 εμφανίσεις

Log Analysis Whitepaper

2008


Page
1

of
12

Log Analysis Whitepaper

Matt Mortimer

mortimer@fsu.edu

Table of Contents

Introduction / Summary

................................
................................
................................
..................

2

Background

................................
................................
................................
................................
.....

3

Problems and Trends
................................
................................
................................
.......................

5

Solution

................................
................................
................................
................................
...........

7

Conclusion

................................
................................
................................
................................
....

11

Works Cited

................................
................................
................................
................................
..

12


Log Analysis Whitepaper

2008


Page
2

of
12

Introduction / Summary

The abundance of technology in today’s market provides a mountain of data that underscores the
Information Age that we live in. Every action or event that takes place in the digital world
potentially
outputs information that is then analyzed and scrutinize
d to produce a potential
reaction. Take a look around you. Almost anything you see today can output data. From
hardware such as cell phones, routers and thermometers to software such as a website
s
, email

and games can and will automatically output data abo
ut the use and the environment that it
operates in.


The output of data can be recorded in a multitude of different ways. Commonly the output of
data is stored in a flat file

with comma separated values universally referred to as a log file

or a
CSV file
.

Other forms of data recording formats include anything from complex relational
databases to raw text. Whether the output of data is recorded in a flat file or a relational database,
it is still called a log by the technology community because it logs data

pertaining to a single
event or a series of events.


Analyzing data in a log file, albeit in a relational database or text file, is very difficult to do
without the support of a
n external
utility design
ed

for analyzing
such
data.
There are many
programs
available for analyzing log files but because of the large number of
different
log
file
formats and

the

informa
tion contained in the log file
itself widens the market from
specifically
formatted log file analysis to an extensive array of support log file f
ormats per program. It is not
uncommon for a user to have several different log files needing analysis or con
version into a
readable format thus

it is important to not only choose a utility that can support a wide range of
different log format
s

but also ha
ve the ability to output the analysis into a format that is easily
understood by the user.


This whitepaper will focus primarily
technology users in the enterprise environment with

common log formats such as W3c format and other formats used by Microsoft
,
Linux and IBM.
A recommendation as to which utility provides the best type of log analysis for these log formats
will be described in detail after careful comparison to competitors available in the market today.

Log Analysis Whitepaper

2008


Page
3

of
12

Background

The output of data is commonly k
nown as a log file. The word log was first adopted in 1963 “
to
describe the systematic recording of specific types of data processing events
” (
Wikipedia, 2008
).
Since it’s rebirth into it
s

current form, we have been privy to the growth of technology and may
not be aware of the amount of data we create on a daily basis. If you use a computer, make a
bank transaction, watch TV, use a phone, and drive a car or a multitude of other things that
you
do on a daily basis then in any event you are creating records of data about your use and
interaction with a particular item of technology. Therefore, because of the abundance of different
types of electronics and other forms of technology, there are s
everal hundred different data
logging standards.


Usually, log files are designed in a way for quick storage of information and in most cases this is
event driven information so as a result multiple records of data quickly add up in a log file. Since
the
log file is meant as a quick and dirty way of storing this data, most users don’t take advantage
of trying to analyze the data and transform it into a beautiful chart or graph form.

For example,
when managing a website you can find valuable data in the log

file to “find insights about [the]
site’s usability, errors in [the] HTML code, the popularity of [certain] site pages and the type of
visitors [the] site attracts”

(
Jordan, 2005
)
.



For IIS logs there is certain data you need to keep an eye on especially if you are campaigning
your site on
several
search engines. If you are
,

then you need to keep an eye on referring sites
and search engines so you know which search engine is providing

the most return on your dollar
.

Y
ou can use this to invest money you’re spending on search engines that aren’t producing any
results into areas where you are getting results.


Using the W3c extended logging in IIS, you will also receive search phrases us
ed by visitors
who visited your page via a search engine. This too is invaluable because this will tell you what
phrases you need to focus on and which ones you
can eliminate
.
In some cases, you pay per
phrase when using such search provider’s service
s
.
Fo
r example, if you sold deserts on your
website and visitors are hitting your site via a search engine, it is vital to ke
ep an eye on the
search phrases p
eople

use to find your site. If you find that these search phrases

don’t include
Log Analysis Whitepaper

2008


Page
4

of
12

words such a tiramisu
or flan then you can eliminate these phrases from your
metadata and or the
search provider’s
plan add other phrases to resemble the popular search phrases.


The

W3c IIS

log file can also tell you the most accessed pages on your site, this will tell you
wha
t people are looking for the most and you can either make it more readily available or feature
it on your homepage.

Analyzing this information may tell you that “visitors [are] entering and
navigating your site in ways you didn’t intend” (
Jordan, 2005
). As

a result, you may find that
you should redesign your site for these scenarios or optimize other pages for such keywords or
phrases.


Even in analyzing other log file types you can discover valuable information about the use,
problems and/or needs of techn
ology in your enterprise. Other than assessing data in IIS logs you
could find that your Microsoft VPN server, which also logs
its

data using the W3c format, is
being misused, attacked or overstressed. Simply disregarding the data in the various log files
in
the enterprise is a
severe

mistake that administrators cannot afford to take.



Log Analysis Whitepaper

2008


Page
5

of
12

Problems

and Trends

Because of the abundance of formats and standards for event based data or log files, most
software is designed to accomplish a limited number of specifi
c tasks.
This is not a problem
when searching for a solution to one particular task. For example, many users find themselves
needing to analyze Microsoft IIS logs on a web server. At first glance there are many products to
choose from

but digg
ing

a little
deeper reveals that each
product

can offer a variety of analysis.


A quick Google search for “IIS log analysis” returns several providers for this type of log
analysis. One of the first ten matches return
ed

was VisitorVille which is a gaming SimCity
approach to viewing your IIS traffic. This product offers a 3d interface to live data on your
website

(see Figure 1).


Figure
1



“Each building represents a web page; each bus represent
s a search engine; and each animated
character [is] a real visitor [on] the site” (
VisitorVille, 2007
).
While this type of analysis may be a
fun
and interesting
way to viewing the data in your IIS log file

and is unlike any other log
analysis utility
,
it is

too complex of visualization. I
t may be construed
as overkill for someone
looking for a quick
statistical
analysis of data that can

be

share
d

with
other co
-
wor
k
ers or
employees. VisitorVille works great in a stand
-
alone environment where only the administ
rator
will be analyzing the data, but in a corporate environment it does not bode well.


Log Analysis Whitepaper

2008


Page
6

of
12

On the other side of the spectrum of all the Google results found on the “IIS log analysis” search,
Sawmill.net offers a very robust product. With capabilities such as

the ability to process almost
any log file is a big bonus to those who manage a large environment with different technologies.
Unlike VisitorVille, Sawmill offers a professional statistical output that can be easily shared with
others

(see Figure 2)
.


Fi
gure
2



Products like this are the answer to the prob
lems that other products like
VisitorVille

pose.
Also,
Sawmill is a web based applica
tion so it is easily accessible, yet another benefit to the user
.
However
, the benefits
stop there. A
fter testing this product
with a collection of IIS logs
accumulated over a year, Sawmill took several days to process the data into
the

statistical output.
This kind of response is not acceptable and puts heavy strain on the
hosts’

resources (
i.e.
processor and memory resource
s
).






Log Analysis Whitepaper

2008


Page
7

of
12

Solution

The best solution is to find an application that has the ability to analyze several different log
formats into a presentable and readable format that enables the end user to effectively read and
process
the data into usable information

in a timely manner. Microsoft’s LogParser 2.2 does that
and more.



LogParser is not limited to just one type of log file type and/or standard. Using an SQL
-
like
query core (See example in Figure
6
) to query the data in a l
imitless amount of different log files
such as Windows Event logs, Windows Registry, Active Directory and CSV (comma separated
values) files. For output, LogParser can convert the log data into pictorial charts, graphs or an
SQL database using filters in t
he query (Figure 3).


Figure
3



Aside from Lo
gParser’s capabilities, one major driving factor is the fact that this utility is
completely free. Whereas Sawmill ranges from $99 to $30,000 depending on which feature pack
,
Log Analysis Whitepaper

2008


Page
8

of
12

the numbe
r of users
and
the
number of logs you wish to analyze. In addition Sawmill also
charges a %25
yearly
maintenance fee, with a minimum of $100. VisitorVille ranges from
$14.99 to $49.99 depending on the maximum

number

of daily unique visitors on your
website
;

however
,
VisitorVille

is limited to webpage log analysis

only
.


When price doesn’t matter, free doesn’t matter either. You have been
read about

two of the
largest competitors for log analysis. This whitepaper has described the details of the input and
ou
tput formats supported for each.
You have

also
read

some of the potential drawbacks as well.
Now, please brace yourself as you are about to witness LogParser blow the competition out of
the water.


LogParser is scalable, meaning you could e
a
sily implement
LogParser into an existing
environment with minimal impact and expand the functionalities as little as you want or as much
as needed. For competition purposes, one nice feature Sawmill
has over some of the competition
is it
s availability. As we discussed b
efore
,

Sawmill is a web application which is very useful in
the enterprise where multiple people are required to evaluate log data. Because of LogParser

s
flexibility you can easily implement LogParser into your existing or

into

a new website

(
see
Figure 4
)
.



Figure
4



Using basic HTML you can create a form to gather the information to be used to filter though an
IIS log file, such as date and file accessed. Then you can call an ASP or JavaScript function (see
Figure 5) from the
form
s

post that shells out the LogParser command (See shell example in
Log Analysis Whitepaper

2008


Page
9

of
12

Figure
7
). The end result is a simple form based interface tailored to your own website. This is a
solution that is
flexible

enough to do what you need and scalable enough to fit anywhe
re you
want to put it.
Packaged together with Microsoft’s unprecedented
amount

of documentation,
support and knowledgebase, you have
yourself
the solution that is right for you.


Figure
5

(ASP function)



Most
scenarios

that requi
re analysis of multiple log formats take place in the enterprise
environment. In an
enterprise

or as an
administrator
you may find several implementations of
tool
s

or

utilities that are customized or scripted through the command line.
Managing a network,
a

few servers, or several computers

at some point

requires the use of several scripted or
command line tools. This is not a new protocol and there are thousands o
f

support documents if
you are unfamiliar with how this works. For specific help with LogParser
, please visit the
Microsoft support site and
download

the LogParser documentation.
If you are worried that
LogParser might now be
available in future versions of windows then please take a look at
this

example of
Microsoft Server 2008 with
IIS7
.0

and
a
Vista
sidebar gadget
impl
ementa
tion of
LogParser.



Log Analysis Whitepaper

2008


Page
10

of
12

Although LogParser does not have a nice GUI such t
hat Sawmill or VisitorVille
, it is easy
enough to execute or shell to through the command line.
Taking the IIS W3c log format for an
example again, you can see how
a
command line stat
ement (see Figure 6)
us
ing just a minimal
amount of command line and/or scripting
experience can transform IIS log data
(see Figure 7)
into
a

readable and usable

graph

(see Figure 8)
.


Figure
6

(LogParser command line)



Figure
7

(Log file data)


Figure
8

(
Chart output
)



Log Analysis Whitepaper

2008


Page
11

of
12

Conclusion

If you are willing to put in a little effort to make LogParser work in your enterprise environment
whether it is a perman
ent and customized solution or on

a command line as
-
needed basis then
LogParser is right for you. Why put up with the headaches in trying to implement another costly,
time consuming solution? LogParser is a solution that has everything you need from price to
supported log formats to outpu
t charts and
spreadsheets. Bottom line is LogParser was built and
is supported by Microsoft. If you’re operating in a Microsoft environment and are not afraid of a
solution that is not

an

“out
-
of
-
the
-
box” or turn
-
key
solution
then you have nothing to be af
raid
of.


Log Analysis Whitepaper

2008


Page
12

of
12

Works Cited

Wikipedia, (2008, March 28). Data logging. Retrieved April
1
, 2008, from Wikipedia, the free encyclopedia Web
site:
http://en.wikipedia.org/wiki/Log_file


Jordan, Jerry (2005 Febr
uary 16). Log File Analysis and SEO.
Search Engine College Article Library
, Retrieved
April 0
1
. 2008, from
http://www.searchenginecollege.com/articles/2005/0
2/log
-
file
-
analysis
-
and
-
seo.html


VisitorVille, (2007). VisitorVille: fun, accurate, professional stats for your website. Retrieved April
1
, 2008, from
visitorville Visual Website Analytics
-

Web Stats Meets Videogame Web site:
http://www.visitorville.com/