DMCA Deployment Setup Project Configuration ...

pheasantarrogantSoftware and s/w Development

Aug 15, 2012 (4 years and 10 months ago)

538 views






DMCA Deployment Setup



Project

Configuration

I.


Resources

A.

Dev : localhost macbook pro

1.

mysql: localhost:3306/dmca

2.

dmca processor: run “Main” from netbeans project : the “conf.local”
configuration has been added to the classpath

so that the dev configuration is
used

3.

net2user : local tomcat instance (startup running the shell script : “startup.sh”
or “startup_d.sh” to startup with tomcat remote debugging enabled; shutdown
with “shutdown.sh”


these scripts added to PATH)

a.

deployed
to /opt/tomcat/webapps

4.

dmca web application: deployed to tomcat at /opt/tomcat/webapps

B.

Stage

1.

Muffin
:

a.

Mysql: muffin.uchicago.edu:3306/dmca

b.

dmca processor

c.


net2user

i.

Web service:
https://muffin.uchicag
o.edu:8443/net2user

ii.

Splunk
web
:
https://muffin.uchicago.edu:8000

iii.

Splunk web service API: https://muffin.uchicago.edu:8089


2.

Jiminy
:

a.

dmca web application for acks

i.

student ack:
https://faux
-
dmca.uchicago.edu/acknowledge

ii.

csl ack: https://faux
-
dmca.uchicago.edu/csl/acknowledge

C.

Prod

1.

Spunky

a.

Mysql
: spunky.uchicago.edu:3306/dmca

b.

dmca processor

c.

net2user
:

i.

web service:
https://spunky.uchicago.edu:8443/net2user

ii.

splunk web:
https://spunky.uchicago.edu:8000

iii.

splunk web service api: https://spunky.uchicago.edu:8089


2.

Grillo
:

a.

dmca web
application for acks

i.

student ack:
https://dmca.uchicago.edu/acknowledge

ii.

csl ack: https://dmca.uchicago.edu/csl/acknowledge


II.

Svn

A.

https://versions.uchicago.edu/svn/integration/iteco/dmca/trunk

1.

Also
some of
the C programs statically link to the utilsc library that is on
https://versions.uchicago.edu/svn/integration/iteco/utilsc/trunk

2.

The utilsc library reference in Netbeans is configured to point to
/Users/jmontgomery/Documents/Work/utilsc on the filesystem relative to the c
proje
ct roots at /Users/jmontgomery/Documents/Work/DMCA/dmca
-
trunk/c/<project>

B.

Multiple projects

1.

Root checkout contains the java code responsible for dmca notice processing

a.

Ant build.xml script

i.

dev,

test
,

prod

targets for creating builds appropriate for those
e
nvironments

ii.

Artifacts

a.

dmca.zip: notice processing

b.

dmca.tgz : notice processing (use a. or b)

c.

dmca.war: web deployment for user ack

2.

/net2user

a.

web application that integrates with splunk to provide simple web
service api that identifies user session given a
n ip,time, and optionally
port

b.

Similar ant build.xml for dev, test, prod environments

3.

c/datacopy

a.

copies relevant syslog files
: perfigo, dhcp, cvpn, nat
-
service
-
block

over
dates specified

from syslog
-
0 to splunk static inputs data directory to be
indexed by

splunk engine

b.

C program

c.

Usage is <startDate yyyy
-
mm
-
dd> <endDate> [perfigo | dhcp | cvpn |
nat
-
service
-
block]

i.

List the sources explicitly to limit import from only those
sources or leave them out to import all sources

ii.

Data is copied from the log files on
syslog
-
0 to the appropriate
static input directory on muffin or spunky:
/dmca/data/splunk/staticinputs

4.

c/
radacctdump

a.

Script integration with splunk to dump radius accounting db sessions
every minute for processing by splunk

b.

C program

c.

Deployed at /opt/splu
nk/bin/scripts on the servers

d.

Has a conf.ini file that contains configuration properties

for connecting
to database and a “last.txt” file that contains a unixtimestamp of the
last successful dump

III.


/dmca/app/

A.

dmca

1.

Deployment location of artifact “dmca.zip”:

unzipped in parent /dmca/app
directory

B.

mail

1.

save_msg.pl

a.

simple perl script invoked by procmail to dump input mail messages
from stdin to a file in /dmca/data/mail/received

C.

splunk

1.

datacopy: “datacopy” artifact described in I.B.3

IV.

/dmca/data
/

A.

dmca.lock : a “
lockfile” used by dmca processor to ensure that only one dmca process is
running at any given time

B.

mail/

1.

error

a.


contains notice emails that generated an uncaught exception in dmca
processing code indicating a bug in code

b.

Email also sent out to a configured

list of subscribers with the details of
the error

2.

outbox

a.

Emails generated by dmca but have yet been sent to mail system

3.

processed

a.

Notice emails successfully processed by dmca
handler

4.

received

a.

Notice emails processed by procmail but not yet processed by dm
ca
handler

5.

sent

a.

Emails generated by dmca and have been sent through mail system

6.

tmp

a.

devnull.eml contains emails that were filtered by procmail

C.

mysql /

1.


root directory for mysql table storage

D.

splunk/

1.

indexes/

a.

root data directory for splunk indexing

2.

statici
nputs/

a.

root data directory for splunk static inputs

b.

cvpn,dhcp,pat,perfigo,radius

E.

logs/

1.

app/

a.

dmca processing , net2user, and procmail log files

2.

mysql/

a.

mysql logs

3.

splunk/ : unused


splunk logs in root splunk installation

F.

stage/

1.

staging area for testing




net2user configuration

on muffin and spunky


I.


Splunk

A.

Setup

i.

Must /admin/sudo/become_splunk to become splunk user to
perform these operations

ii.

Copy search configuration

in project directory
splunk/etc
/apps/search/local

recursively

to
/opt/splunk
/etc/apps/sear
ch

iii.

Copy splunk/etc/apps/web.conf to /opt/splunk
/etc/system/local

iv.

Create ssl keys for splunk front
-
end

1.

Put in
/opt/splunk/share/splunk/certs

2.

openssl genrsa
-
out privkey.pem 2048

3.

openssl req
-
new
-
x509
-
key privkey.pem
-
out cert.pem
-
days 3650

v.


create symbolic links /opt/splunk/splunk_data_indexes
-
>
/dmca/data/splunk/indexes and /opt/splunk/staticinputs
-
>
/dmca/data/splunk/data/staticinputs

vi.


logs should be linked to /dmca/logs/splunk

vii.


/dmca should be 770 and have g+s set

viii.

change default index
location to /opt/splunk/splunk_data_indexes

in splunk admin interface

ix.

change license to enterprise after starting splunk for first time
through the admin interface

B.

Index copying

i.

Just remove old data in /dmca/data/splunk/indexes and copy over
new indexes to

that location

1.

Original splunk server should be shutdown before making a
copy and new splunk server should be stopped while
making the copy

II.

Tomcat

A.

remove all other webapps
-
> manager, ROOT, admin,examples,etc
-
> move
them to another directory

B.

/opt/apache
-
t
omcat/conf

i.

Create ssl keys for tomcat https

ii.

keytool
-
genkeypair
-
alias tomcat
-
dname "cn = dmca, ou=its
o=uchicago c=US"
-
validity 3650
-
keystore key.store
-
keyalg RSA
-
keysize 1024

iii.

htt
p://tomcat.apache.org/tomcat
-
6.0
-
doc/ssl
-
howto.html

for
additional info

C.

Modify server.xml in /opt/apache
-
tomcat/conf


<!
--

A "Connector" represents an endpoint by which requests are received


and responses are returned. Documentation at :



Java HTTP Connector: /docs/config/http.html (blocking & non
-
blocking)


Java AJP Connector: /docs/config/ajp.html


APR (HTTP/AJP) Connector: /docs/apr.html


Define a non
-
SSL HTTP/1.1 Connector on port 8080


--
>


<Connector
port="8080" protocol="HTTP/1.1"


connectionTimeout="20000"


redirectPort="8443" />


<!
--

A "Connector" using the shared thread pool
--
>


<!
--


<Connector executor="tomcatThreadPool"


port="8080" protocol=
"HTTP/1.1"


connectionTimeout="20000"


redirectPort="8443" />


--
>


<!
--

Define a SSL HTTP/1.1 Connector on port 8443


This connector uses the JSSE configuration, when using APR, the


connector should be using the OpenSSL style configuration


described in the APR documentation
--
>


<Connector protocol = "org.apache.coyote.http11.Http11Protocol"




port="8443" SSLEnabled="true"


maxThreads="20" scheme="h
ttps" secure="true"


clientAuth="false" sslProtocol="SSLv3"


keystoreFile="/opt/tomcat/conf/key.store"


keystorePass="dmca_splunk"


/>




<!
--

<Connector port="8009" protocol="AJP/1.3" redirectPort
="8443"
/>
--
>


D.


Modify tomcat
-
users.xml in /opt/apache
-
tomcat/conf






<role rolename="dmca"/>


<user username="dmca" password="splunk" roles="dmca"/>





Web application configuration

I.


Tomcat

a.

/opt/apache
-
tomcat/conf/server.xml

i.

Disable all connectors except the ajp connector on 8009

ii.


<!
--

Define an AJP 1.3 Connector on port 8009
--
>


<Connector port="8009" protocol="AJP/1.3" redirectPort="8443"
tomcatAuthentication="false" enableLookups="false"/>

b.

dmca.war deployed to /opt/apac
he
-
tomcat/webapps

II.

Apache

a.

/etc/httpd/conf.d/proxy_ajp.conf

ProxyPass /csl/acknowledge ajp://localhost:8009/dmca/cslack

ProxyPassReverse /csl/acknowledge ajp://localhost:8009/dmca/cslack


ProxyPass /acknowledge ajp://localhost:8009/dmca/studentack

ProxyPassR
everse /acknowledge ajp://localhost:8009/dmca/studentack


ProxyPass /dmca ajp://localhost:8009/dmca

ProxyPassReverse /dmca ajp://localhost:8009/dmca

b.

Create ssl certs for apache



Database Configuration



All MySQL db scripts in the “sql” directory of the svn

checkout



Scripts

o

dmca_schema.sql

: ddl script to create the dmca schema

o

procedures.sql

: Contains triggers and procedures for use with dmca
schema. This script should be run after “dmca_schema.sql” when
recreating the database

o

dmca_clean_tables.sql
: Remo
ves all data from dmca schema


used for
development and staging




Common Scenarios




Expunging a record

Use Case
: A subject refutes a notice sent to them. IT Services investigates and determines that
the subject was wrongfully identified.

Objective
: Must

remove subject association to that notice in database. If no other notices are
associated with the “cluster”, then the subject’s association with that cluster is removed and
cluster is deleted from database. If no other clusters are associated with that

subject, then the
subject entry is deleted from database. Ticket history will be retained for that notice for
reporting purposes and additional entry noting wrongful identification will be stored.


A stored procedure has been created to take care of this

use case: “dmca_delete_user_ref
(IN
l_ticket_id BIGINT)
”. Call this stored procedure from mysql command prompt or run from a
script table in Mysql Administrator with “call dmca_delete_user_ref(<ticket_id>);”, passing the
ticket id in the notice that was c
leared for the subject.




Investigating a notice

Use Case:
Sometimes a subject will dispute the validity of a notice, and currently IT Services
must individually investigate these.

Objective:
Establish the validity of the notice


if notice deemed invalid then must engage
“Expunge a record” use case noted above. Otherwise, an email needs to be sent back to subject
saying that their case was investigated, and the notice was valid, and they must

follow
instructions given in the notice.


There are two steps to take during the investigation. First, assert the validity of the timestamp.
The ACNS schema does not have a specification for the time server that the provided
timestamp was synced to. W
e have found timestamps off by more than 20 minutes in the past.
As a general rule, if the timestamp occurred less than 10 minutes before the start of their login
session and less than 10 minutes before they logged out, then they are exonerated if there a
re
no other outstanding notices associated with them. A second step that can be taken is to
investigate the actual network traffic in the 20 minute time window of the timestamp and look
for p2p activity. Lancope Stealth Watch is used for this purpose on
smack.uchicago.edu. Talk to
Jim Clark for details.

To accomplish step one, a “special” version of net2user was created. I say special just because
the code is only available on the staging server and has yet to be moved to production. It adds
the
additional “end” parameter so that a time range can be specified. All users with sessions
that overlap the given [time,end) interval are returned in chronological order. The login and
logout times of the sessions can then be examined to see if the timest
amp is close enough to
session boundaries to be disputable.


Example:


John Doe (jdoe) has a notice

on 6/1/2011 5:00pm CST on ip 128.135.230.11.


The query would then be:


https://muffin.uchicago.edu/net2user?ip=128.135.230.11&time=2011
-
06
-
01T16:50:00&end=2011
-
06
-
01T17:10:00


The output will then be zero or more “user” elements corresponding to all the auth sessions
that were active betwee
n 6:50pm and 7:10pm.






Splunk Tutorial

Additional documentation:



http://www.splunk.com/base/Documentation



In “doc” folder of svn checkout

o

Splunk
-
4.1
-
Developer.pdf



Information on the splunk REST w
eb service API used to build net2user

o

Splunk
-
4.1.7
-
User.pdf



User guide for working with splunk


configuration and front
-
end usage

o

Splunk
-
4.1.7
-
SearchReference.pdf



Good reference for comprehensive list of search commands and syntax


The purpose of splunk

is to collate all the authentication log info so that a single service can be used to
query about an authentication session. Currently the process is only implemented for wireless and vpn
sources, but plans were in place to expand to wired sources, inclu
ding dhcp and static as necessary. I say
as necessary, because at least 2/3 of the notices are coming from wireless and most of the notices are
coming from students who make heavy use of wireless. It may not be required to implement wired
sources at all.

I did begin working with Ryan Milner to integrate with these sources. We basically had an
initial meeting with it and it looked like it required integration with both infoblox,dhcp logs, and an
internal database that they maintain.

In order to support l
oading of past data and deal with outages,
the data comes in two different ways: statically from a file dropped in a directory or in real
-
time through
udp from the source. A third form comes from a script croned by Splunk. This third form was necessary
b
ecause Splunk only supports streaming of text data directly and so to get info out of a database a
wrapper script had to be written to dump the data to text. The static inputs are needed only if there is
a server outage

where log catch is required. Use
the “datacopy” utility to import the required logs.




Currentl
y, the integration consists of

Environment: Production on spunky

Source type

Static transfer

Real
-
time transfer

c
vpn

/dmca/data/splunk/staticinputs/cvpn

205.208.120.4 to udp 11001

nat
-
service
-
block

/dmca/data/splunk/staticinputs/pat

128.135.100.100 to udp
11002

perfigo

/dmca/data/splunk/staticinputs/perfigo

Syslog
-
0 to 11004 udp

dhcp

/dmca/data/splunk/staticinputs/dhcp


radius accounting db

Script
/opt/splunk/bin/scripts/radacctdump




dhcp is not streamed in real
-
time yet because it will only provide useful data if the wired sources are
implemented, and only useful token that will be provided is the mac address. The wireless sources all
give their mac addresses in their authentic
ations, so it is unnecessary. For cvpn, it is not possible to
retrieve the mac address without resorting to arp table queries since the cvpn operates at the
application layer and Cisco has said it is not possible to extract this info. Currently there is
no use case
for the mac address anyway.


Source type

Static transfer

Real
-
time transfer

cvpn

/dmca/data/splunk/staticinputs/cvpn

205.208.120.4 to udp 11001

Nat
-
service
-
block

/dmca/data/splunk/staticinputs/pat

128.135.100.100 to udp
11002

Dhcp

/dmca
/data/splunk/staticinputs/dhcp


Perfigo

/dmca/data/splunk/staticinputs/perfigo


Radius accounting db

/opt/splunk/bin/scripts/radacctdump



The important configuration files for splunk are stored locally in repository at path
splunk/etc/apps/search/local

and remotely on stage and prod at /opt/splunk/etc/apps/search/local



Indexes.conf

o

Configures the splunk db for dmca and specifies its location



Inputs.conf

o

Specifies the static and real
-
time input sources to Splunk



Fields.conf

o

Specifies the “indexed” fields

used by Splunk and accessed through search interface



Transforms.conf

o

Configuration of parsing, filtering, and transforming the input data



Props.conf

o

Hookups of filtering and transforming to the input source types



Eventtypes.conf

o

Not used anymore



Limits.co
nf

o

Search limits



Tags.conf

o

Not used anymore



Outputs.conf

o

Unused



Viewstates.conf

o

Not used anymore


Splunk can either parse data at search time or index it at input time. The latter approach was taken for
space and time reasons. The bulk of the configurati
on is in “transforms.conf”. Data is first filtered to
only select the messages that contain login, logout, or pat association info. It is then transformed to put
the relevant data elements in a comma separated form. Lastly, these csv elements are indexe
d to

their
respective field names. The relevant fields are detailed below:

Field Name

Description

_time

timestamp info (automatically selected by splunk)

dmca_cnet

User CNetID

dmca_priv_ip

Private network address for session (will be same
as public
address if no private 10.x.x.x address
assigned)

dmca_src_ip

Public network address for web traffic

dmca_mac

Mac Address of the session

dmca_src_port

Public network port for web traffic (Used for PAT
data)

dmca_has_login

Defined to “1” if entry
捯n瑡楮猠愠u獥s g楮

Tm捡彨慳彬_gouW

Defined to “1” if entry has a user logout

Tm捡彨慳彴牡晦楣

Defined to “1” if entry is for web traffic going
瑨牯ugh⁐A吠牡瑨e爠rh慮⁡ u獥爠汯g楮 o爠rogouW
敮瑲e

Tm捡彨慳彭慣

Defined to “1” if entry contains a mac ad
T牥rs


䕸慭pl攠汩n敳eo映摩晦敲敮琠sou牣攠瑹pe猺

Cvpn
: Logins only implemented

Jan 18 11:52:50 205.208.120.4 Jan 18 2011 11:52:50: %ASA
-
4
-
722051: Group <remote
-
users
-
default
-
group
-
policy> User <jmontgomery> IP <128.135.99.114> Address <205.208.123.133> ass
igned to session

Nat
-
service
-
block
: dynamic udp/tcp translations

Jan 18 10:42:49 128.135.100.100 Jan 18 2011 10:42:49: %ASA
-
6
-
305011: Built dynamic TCP translation
from nat
-
inside
-
vlan2712:10.150.47.54/54794 to nat
-
outside
-
vlan2711:128.135.100.102/27537

P
erfigo
: Logins and logouts (both explicit and timeouts)

Jun 9 00:02:33 10.135.2.138 Perfigo: Authentication:[C0:CB:38:02:AF:13 ## 128.135.219.4] jwlapinski
-

Successfully logged in, Provider: RADIUS
-
Production, L2 MAC address: C0:CB:38:02:AF:13, Role: wir
eless,
OS: Windows Vista

Jun 9 00:03:01 10.135.2.138 Perfigo: Authentication:Unable to ping 128.135.219.225, going to logout
user csoltys

Jun 9 00:25:51 10.135.2.138 Perfigo: Authentication:[B4:07:F9:A6:F5:92 ## 128.135.149.174]
kennethoshita
-

Logged ou
t successfully

Radius:
logins and logouts outputted from radacctdump program

2011
-
06
-
09 14:14:17;login;10.150.24.29,24:AB:81:BD:E4:CC,jxchong

2011
-
06
-
09 14:14:21;logout;205.208.124.79,bspoka


When developing regular expressions with Splunk

I found the tool “Reggy” for MacOSX invaluable.

The
Splunk regexes always assume that the timestamp is the first token and so a time extraction is not
explicitly given in the user
-
defined REGEX.

Example of transformation sequence for the following
loggi
ng entry:

Jun 9 00:02:33 10.135.2.138 Perfigo: Authentication:[C0:CB:38:02:AF:13 ## 128.135.219.4] jwlapinski
-

Successfully logged in, Provider: RADIUS
-
Production, L2 MAC address: C0:CB:38:02:AF:13, Role: wireless,
OS: Windows Vist
a




Since the data comes

into splunk on an input as the configured “perfigo” source as given in
“inputs.conf”, the “props.conf” file configures the transformation sequence as follows in the
“perfigo” source stanza. I have numbered them for referencing convenience

[perfigo]

1.

TRANS
FORMS
-
perfigo_0_filter = filter_null,filter_perfigo

2.

TRANSFORMS
-
perfigo_1_set_login= set_perfigo_login

3.

TRANSFORMS
-
perfigo_1_set_logout_user= set_perfigo_logout_user

4.

TRANSFORMS
-
perfigo_1_set_logout_auto= set_perfigo_logout_auto

5.

TRANSFORMS
-
perfigo_2_extract_l
ogin = extrct_perfigo_login

6.

TRANSFORMS
-
perfigo_2_extract_logout = extrct_perfigo_logout

Splunk orders these sequences lexographically, so I named them so that they will be processed
in the order listed in the “props.conf” file. Each one of these lines mus
t begin with the string
“TRANSFORMS” to indicate that it is
referencing a transformation configuration defined in
“transforms.conf”. When following along have both “props.conf” and “transforms.conf” open.

1)

does the initial regex filtering. Two configurat
ions are listed “filter_null” and “filter_perfigo”. The
“catch all” configuration is always listed first. “filter_null” simply drops the data, which is the
behavior we want when it matches no other regular expressions. “filter_perfigo” looks for

(Succe
ssfully logged in)|(going to logout user)|(Logged out successfully)
” . The input text contains
the string “Successfully logged in” so it matches this regex. The configuration indicates that this
data should be routed to the “indexQueue” meaning that it
should be subject to further processing
and finally indexed and stored. So we passed first test, but the data is still just raw unstructured
text.

2)

This is the text transformation step where we will only select the fields that are of interest. The
regex d
efined for “set_perfigo_login” is run against input text and a match is found. The regex
contains groups containing the pertinent field information that will be extracted. These groups are
extracted as given in the “FORMAT” specification

with the additio
nal string “;login;” inserted

and
outputted

to next stage in process
.


At this point the data now looks like this:

Jun 9 00:02:33
;login;
C0:CB:38:02:AF:13
,
128.135.219.4
,
jwlapinski

3
-
4 are run against the input but result in no match

5) This is a field extr
action step where fields will be named like in a hash structure with key
-
>value
pairings. “extrct_perfigo_login” is run
, which matches output in 2) and binds the named groups in the
regex to the indexed fields given in the “FORMAT” spec. The additional
bindings not in the regex:
“dmca_has_login”=>”1”, “dmca_has_traffic”=>”1”, “dmca_has_mac”=>”1”


are also assigned.



These indexed fields will be referenced directly as search query parameters. The raw text
outputted from 2) will also be stored in the db and will show up when you do searches on the
splunk web interface.

6 is run but the output from 2) does not match the

REGEX filter so nothing happens.





Problems and Future Considerations



Poor mechanism to ensure high availability of log data

o

No “auto
-
detect” process to see that splunk server has gone down. When outage is
discovered must use “datacopy” utility to
copy over the missing data



No Splunk support contract in effect

o

Enterprise license donated by Dan Sullivan: 13.5 GB index limit per day (less than 5%
current data utilization of this license)


product worth $50k

o

Not paying support contract of about $8.6k/
yr. Consequence is we are version locked at
4.1.6 and cannot upgrade even to a revision release. Good news is no issues have been
discovered yet

Email correspondence concerning support contract:

From:

Margaret Patzer <mpatzer@splunk.com>


Subject:

RE:
Splunk Support


Date:

January 28, 2011 1:57:09 PM CST


To:

Justin Montgomery <jmontgomery@uchicago.edu>

Hi Justin:



Currently you have to have an in
-
force support contract to get to the new version. You
can keep the latest version you go but are not
entitled to get 4.1.7 or newer until the
support is brought current. I know it is not what you wanted to hear


Sorry!



Best Regards,


Margaret Patzer

Account Manager,


IA, IL, MN, WI

call me Direct at: 972
-
244
-
8762

mobile # 214
-
394
-
9261



Need a quick answer to a straightforward Splunk question?

Check out our new
answers page
-

http://answers.splunk.com



From:

Justin Montgomery [mailto:jmontgomery@uchicago.edu]

Sent:

Friday, January
28, 2011 1:3
6 PM

To:

Margaret Patzer

Subject:

Re: Splunk Support



I love your product, but is the support contract required?

I saw on your support page that it is
needed to upgrade to a new major version of splunk.

Right now we are using splunk to
aggregate syslog
messages to store authentication events.

This is used to identify which user was
logged in given an ip address/port and a time.



Until we need to expand our use case further, I
am happy with our current version of splunk.




Would our license still be a
ctive and useable and we could upgrade to and 4.x.x version without
purchasing the support contract?



Thanks,



Justin



On Jan 28, 2011, at 10:26 AM, Margaret Patzer wrote:



Hi Dan:



Good to hear from you. The total per year for 13GB is 8266.60.



Best
Regards,



Margaret Patzer

Account Manager,

IA, IL, MN, WI

call me Direct at: 972
-
244
-
8762

mobile # 214
-
394
-
9261



Need a quick answer to a straightforward Splunk question?

Check out our new answers page
-

http://answers.splunk.com



-----
Original Message
-----

From: Daniel Sullivan [mailto:dansully@uchicago.edu]

Sent: Friday, January 28, 2011 10:24
AM

To: Margaret Patzer

Cc: Daniel Sullivan; Justin Montgomery

Subject: Splunk
Support



Margaret,



Can you tell
me how much we are supposed to be paying for year for
support on our 13.5GB/day Splunk license?



Thanks,



Dan Sullivan

312
-
607
-
3702



Justin Montgomery

University of Chicago: ITS

6045 Kenwood Ave. Rm 310
-
15

773
-
834
-
5247

jmontgomery@uchicago.edu



Alternative to Splunk:




Store authentication info in an Oracle db and PAT translation info in a cdb (use cdb
implementation Tiny CDB:
http://www.corpit.ru/mjt/tin
ycdb.html
)



The cdb would use the key (source_ip_addre
ss,source_port) and map it to a random
-
access
list
of (timestamp,private_ip) entries
. The list be ordered chronologically with the oldest entry first.



Every hour (or possibly day


would have to experiment) cdb file would roll over. The lookup
query would be performed on correct cdb file given input timestamp



The idea is that over 99% of data volume is from PAT entries. PAT entries lend themselves to a

hashtable implementation and cdb reports extremely fast reads and decently fast writes for this
kind of data



Trouble is that there are a ton of writes and the disk will be constantly pounded by this
implementation
. Also need file
-
level locking so that if

query in progress on cdb file updates
must block until it is done being written and vice
-
versa for updates in progress

o

Most efficient locking implementation will be read
-
write lock => Allow as many readers
on file concurrently but if writing in progress t
hen only that writer and no other readers
can access the file during the write operation

o

The partitioning of the file space will prevent the files from growing beyond the 32
-
bit
4GB limit and avoid locking on historical data.

o

Hourly was chosen as a possi
ble rollover time because most dmca searches are on data
at least several hours old and so locking could be avoided.



Setup will only allow forward lookups from public ip/port to private ip; otherwise, a lookup table
would be needed in db for storing the pr
ivate to public associations since a single private ip can
correspond to many unique PAT entries. Use case for dmca supports this as lookups are always
forward



Schema for db could be as simple as single table. Each entry would be a “login” or a “logout”
.
Columns would be (timestamp,type (in|out),CNetID,ip_address, source
(radius,perfigo,etc),mac_address). Separate tables for log ins and log outs could be considered
to avoid “tripping over” data in searches putting together a session from a log in and a

log out. I
actually think the two table approach is superior from performance standpoint, but a dba should
be consulted.



Pseudo
Query to put together session would be

o

Find login: “select * from login t where t.ip_address=ip AND t.timestamp
< eventTime
order by timestamp DESC limit 1;

o

Find logout:



A) “select * from logout t where t.ip_address=ip AND t.timestamp > loginTime
order by timestamp ASC limit 1;



B)”select * from login t where t.ip_address=ip AND t.timestamp > loginTime

order by timestamp ASC limit 1;



Select either A or B based on which timestamp comes first



If the cnet
id in the logout does not match that in login, a “mismatch” occurs as
defined in net2user.



Configuration file should be present to determine whether an

IP is a “PAT ip” or a regular
session ip. If it is a “PAT” ip, then the cdb must be queried to get the private ip. Algorithm for
lookup in list once the appropriate key is found in the lookup:

If(list.size() > 1)

{

for(int i=list.size()


1; i>=0;
--
i)

{


if(list[i].timestamp < t)

{



index =
i
;

break;

}

}

// found first entry with timestamp before t so “latest” permitted entry is one
after this

If(index !=
-
1)


Return list[i+1];

}

else if(list[0].timestamp > t)

{


return list[0];

}