Glassbox: Open Source Java Monitoring and Troubleshooting

batterycopperInternet and Web Development

Nov 12, 2013 (3 years and 5 months ago)

102 views

1

Glassbox: Open Source Java
Monitoring and Troubleshooting

Ron Bodkin

Glassbox Project Leader

ron.bodkin@glassbox
x
.
.
com


2

© Glassbox Corporation 2005
-
2007. All Rights Reserved

First a summary…


Glassbox is an open source automated
troubleshooter for Java applications.


It monitors to detect performance problems and
failures


It analyzes the data to pinpoint causes


Troubleshooting at every phase is key


especially integrating new tech like AJAX and ESB


You don’t have to bake it in. Drop glassbox.war
into your app server, and it troubleshoots your
existing apps with ~1% overhead.


Glassbox provides the open source community with
an easy, automated troubleshooter,

consider adding it to your stack

3

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Agenda


Glassbox Intro and Demo


Using Glassbox


Extending Glassbox


Conclusion

4

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox Troubleshoots Java


Glassbox provides automated diagnosis


Discovers application problems


Suggests causes


Rules out incorrect hypotheses


Uses across development cycle


Dev, QA, production

5

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox Key Features


Drop
-
in installation. Up in minutes on existing
apps with no source or build changes needed.


One click problem diagnosis, focus on the 80% of
common issues


Glassbox learns the app, the user doesn’t need to
configure it


Flags Service Level Agreement violations


Clear descriptions with supporting evidence


Low overhead: won’t slow down production
applications.


Extensible: add application
-
specific logic to
general
-
purpose performance and failure detection

6

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Enterprise Monitoring…

Users

Support

“Conversion rates down”

“Can’t log in”

“Order entry screen slow”

Ops

Dev

“Blip on a graph”

“Bug in the code”

“Worrying trend”

CEM

System Monitors

Traction
Problem

Prioritization

Problem

7

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox Live Installation & Demo


Install


Troubleshoot existing app


Real
-
time results

8

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Agenda


Glassbox Intro and Demo


Using Glassbox


Extending Glassbox


Conclusion


9

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Performance


Low overhead at runtime… suitable for
production


<1% increase in end to end response times


Focused data capture on slow operations


Not heavy instrumentation


Low frequency (100 ms) sampling


Glassbox uses Load
-
time weaving, which
has little effect on end to end speed


~40% slower for class loading (initialization only)


~20% memory overhead

10

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox 2.0 Open Source


Non
-
invasive data capture


Captures data with Aspect
-
Oriented Programming


Uses Java 5+ and server JMX data


Glassbox Troubleshooter analysis


Automatic diagnosis of the 80% of common
problems


Correlates, compares, analyzes data from data
capture & summary


Exposed through an AJAX Web client


Open Source LGPL License


Supports Java 1.4 and later


Glassbox 2.0 release by October

11

© Glassbox Corporation 2005
-
2007. All Rights Reserved

What is in the Glassbox 80%?


Slow database query


Failing database query


Bottleneck or thread contention


Slow remote call (web service, EJB, Ajax…)


Failing remote call


Database connection failure or slowness


Java Mail Issues


Broken FTP



Failing operation


Slow Java

12

© Glassbox Corporation 2005
-
2007. All Rights Reserved

What Glassbox Won’t Do


Solve problems that haven’t happened yet


Having a problem in hand means alerts are actionable.


Diagnose every problem


80% of your effort goes toward resolving common
problems, those are the ones we focus on


Provide low
-
level data crunching tools


Automate a people process


Clear data improves collaboration more than workflow
automation.


Glassbox doesn’t (for example) locate the owner of code,
or send an email to someone’s manager.


Web
-
accessible HTML does enable collaboration

13

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Common Application Problems

Page Problem
Other code problem
Config Problem
Network Problem
App Server
Problem
Lost HTTP Session
Misconfigured
load balancer
Build problem
Deployment problem
JDBC driver incompatibility
VM bug
App server bug
Cache size
Connection pool size
1000
XML parses
Database Problem
Slow query
Systemic problem
Hardware Problem
Out of memory
Out of disk
1000
db hits
Design problem
Too much rope
HTTP Session abuse
Missing index
Network hiccup
Race condition
Code
doesn’t scale
Resource leak
Memory
Connections
Gluttony
Invasive GC
Slow gateway
Deadlock
Other bug
Contention w
/
other processes
Outdated statistics
Surprise table scan
Too few threads
Too many threads
14

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox Concepts


Operations


Service
-
Level Agreements


Response API


Plug
-
in API


Clustered Monitoring

15

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Operations


Glassbox rolls up performance by operation:


A single request processed on behalf of an action


e.g. Spring Controller, Servlet, JSP, EJB, JAX
-
WS, DWR call…


Glassbox counts each associated request towards a
single operation


it prioritizes operations (Spring Controllers are higher
priority than Servlets, etc)


it picks parents over children if equal priority


in future we’ll add “requires new” semantics


This is more application
-
oriented than the
traditional profiler call tree world view


Glassbox uses operations to define service levels…

16

© Glassbox Corporation 2005
-
2007. All Rights Reserved


Service
-
Level Agreements



Glassbox service level agreements are based on the
operation


Real user issues are not subtle under load: failing
or slow code sticks out like a sore thumb.


Failing operations: Glassbox presents any failure as
serious.


Slow operations: base SLA is any operation slower
than 1 sec 5% of the time. Can be configured.


Soon: more granular thresholds by specifying a
hierarchical SLA level based on Java packages,
classes, and annotations

17

© Glassbox Corporation 2005
-
2007. All Rights Reserved


Response APIs



Glassbox has a listener API to track meaningful responses to
various requests


The console statistics come from summarizing these
responses


There are more detailed log topics available (with more
coming)


This also supports alternative forms of summarization


Events can be generated based on:


Aspect monitors


Event listeners


Interceptors


Annotations


Directly in code


Thread sampling is currently done as an auxiliary approach…
but will be merged in the future

18

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Plug
-
In API


Glassbox has a plug
-
in API which allows
applications added flexibility:


Add Custom Operations or new Frameworks


Define Custom SLA (e.g., stale cache)


Define your own Analysis process


Create a Custom UI: Velocity reports for
operations…


Add additional functionality like automatic
connection discovery

19

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Clustered Monitoring


Glassbox allows monitoring multiple servers


The console can connect to other VMs via
JMX/Remote or Direct RMI


Connections are saved to a configuration database


Advanced configuration via properties and Spring


Glassbox clients display data for each (or you can
select subsets)


You can also connect remotely with JMX to get
lower
-
level statistics (e.g., via JConsole)


This also allows connecting to non
-
Web containers
from outside (e.g., monitoring a Swing UI)

20

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox Architecture

Web
Services
Any Database
Agent
Java Virtual
Machine
AOP
Java
Application
Web
Server
Application Server Machine
Client Machine
Browser
HTTP
Application
21

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox Data in JConsole

22

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Persistence via Logging…


Individual operations that fail or are slow


Requests that exceed a performance
threshold (configurable at runtime)


multiple entries per operation available


Periodic sampled time traces


Debugging


Response trace


AspectJ weaving details


2.0.x: summarized entry per operation


23

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Agenda


Glassbox Intro and Demo


Using Glassbox


Extending Glassbox


Conclusion


24

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Monitoring with Aspects


Aspects run automatically at well
-
defined points at runtime


No need to instrument code


Allows low overhead tracking


Easy to update monitoring policies


Enable and disable, even sampling


Standardized support


Extensible with open source AspectJ language.


Load
-
time weaving avoids changes to build process.


Spring AOP allows coarse
-
grained components


Flexibility


Reuse open source monitors for common APIs


Easy to extend for custom monitoring…

25

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox is easily extensible


With Spring AOP


@AspectJ style with Java 5 Annotations


XML Schema
-
based in Spring config file


Avoids time and complexity of weaving many classes


Allows instance
-
based configuration


With AspectJ


XML configuration for simple cases


@AspectJ style with Java 5 Annotations


AspectJ code style


Directly via response and plug
-
in APIs

26

© Glassbox Corporation 2005
-
2007. All Rights Reserved

AOP Concepts


Aspects


a type that can crosscut other types


Join points


well
-
defined points in program flow


Pointcuts


join point ‘queries’ that match and bind context


Advice


block of code executed at specified pointcuts


27

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Spring AOP Overview


Proxy
-
based AOP support for Java


Integrated with Spring Bean Container


Proxies generated at runtime


Avoids time and complexity of weaving many classes


Provides instance
-
based configuration


Provides a simple tool chain


With good IDE support as of 2007!


But less fine
-
grained control


Useful for monitoring Spring
-
configured beans


Not useful for objects created externally to Spring

28

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Spring AOP Mechanisms

FooService

(.java source)

FooService
(bytecode)

ServiceMonitor

(bytecode)

ServiceMonitor

(.java
-
source)

javac

javac

Spring

Container


Dynamic proxy creation per instance of
advised
beans

Foo

Servic
e

Bean

Foo

Servic
e

Proxy

29

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Schema AOP with Glassbox

public class
ServiceMonitor {


private
ResponseFactory responseFactory =


AbstractMonitor.getResponseFactory();



private
FailureDetectionStrategy fdStrategy =


AbstractMonitor.getFailureDetectionStrategy();



// getters and setters for fields






public void

serviceStart(Service service) {


Response response =


responseFactory.getResponse(service.getId());


response.setLayer(Response.SERVICE_PROCESSOR);


response.start();


}



Advice method: run before service execution

Uses GB

Response

API

30

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Schema AOP with Glassbox





public void

serviceError(Service service, Throwable t) {


Response response = responseFactory.getLastResponse();


FailureDescription description =


fdFactory.getFailureDescription(t);


response.set(Response.FAILURE_DATA, description);


response.complete();


}




public void

serviceReturn(Service service) {


responseFactory.getLastResponse().complete();


}

}

Advice method: run after service throws

Advice method: run after service returns

31

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Schema AOP with Glassbox

<aop:config>


<aop:pointcut id="serviceExec"


expression="
execution
(
public

* *(..)) &&


this
(service)"/>



Picks out
join points
where services execute


Matches public method executions


with any return type


on any type, with any method name


with any number of parameters


Where the currently executing object can be
bound to service (the advice methods limit
that to type Service)

32

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Schema AOP with Glassbox

<aop:config>


<aop:pointcut id="serviceExec" …/>


<aop:aspect id="serviceMonitorAspect"


ref="serviceMonitor">


<aop:before pointcut
-
ref="serviceExec"


method="serviceStart"/>


<aop:after
-
throwing pointcut
-
ref="serviceExec"


method="serviceError"/>


<aop:after
-
returning pointcut
-
ref="serviceExec"


method="serviceReturn"/>


</aop:aspect>

</aop:config>


<bean id="serviceMonitor" class="my.ServiceMonitor"/>

Binds to advice methods
in service monitor bean

Aspect defines advice to
execute at pointcuts

33

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Putting it together…

<aop:before pointcut
-
ref="serviceExec"


method="beforeService"/>


Before
advice runs before…

(this is one
kind
of advice)


any join point that matches the pointcut
definition for serviceExec…

execution
(
public

*
my.service.Service+.*(..))


by dispatching to the
beforeService
method
on the
serviceMonitor

bean

34

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Spring @AspectJ with Glassbox

@Aspect

public class
ServiceMonitor
extends

AbstractMonitorClass {


@Pointcut
("
execution
(
public

* *(..)) &&
this
(service)")


public void
monitor(Service service) {


}



@Before
("monitor(service)")


public void

serviceStart(Service service) {


begin(service.getId());


}



@AfterReturning
("monitor(*)")


public void

serviceReturn(StaticPart staticPart) {


endNormally(staticPart);


}

}



35

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Spring @AspectJ with Glassbox




@AfterThrowing
(pointcut="monitor(*)", throwing="t")


public void

serviceException(Throwable t,


StaticPart staticPart) {


endException(t, staticPart);


}



public

String getLayer() {


return
Response.SERVICE_PROCESSOR;


}

}


Here the aspect just needs to extend
AbstractMonitor


this is a built
-
in Glassbox framework AspectJ aspect


it provides configuration, template methods,
runtime control

36

© Glassbox Corporation 2005
-
2007. All Rights Reserved

AspectJ Overview


The original AOP implementation


Language extension, @AspectJ annotation, and XML definition
options


Java platform compatible


Performance comparable to hand
-
written equivalent


Tool support


Compiler, linker, classloader
-
based weaving


IDE support: Eclipse, JBuilder, JDeveloper, NetBeans


Ant, Maven, ajdoc, Java debugger


Open source:
http://eclipse.org/aspectj

37

© Glassbox Corporation 2005
-
2007. All Rights Reserved

AspectJ Mechanisms

FooService

(.java source)

FooService
(bytecode)

ServiceMonitor

(bytecode)

ServiceMonitor

(.aj/.java
-
source)

FooService

(modified bytecode)

Javac/ajc

Ajc/javac

ajc (weave)

Runtime system


Relies on bytecode modification of aspect
-
affected classes


Weave can happen at compile, post
-
compile, or load time


Can package as class files, jars, or in memory bytecodes

ServiceMonitor

(original bytecode)

38

© Glassbox Corporation 2005
-
2007. All Rights Reserved

AspectJ XML customizing Glassbox

<aspectj>


<weaver>


<include within="my.service.Service+"/>


</weaver>


<aspects>


<concrete
-
aspect name="ServiceProcessingMonitor"



extends="glassbox.monitor.MethodMonitor">




<pointcut name="monitoredPublicMethods"


expression="
within
(my.service.Service+)"/>


</concrete
-
aspect>


</aspects>

</aspectj>

39

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Sample AspectJ Code

public aspect

EmailMonitor
extends

AbstractMonitor {




public pointcut

monitorPoint(Object message) :


within
(javax.mail.Transport+) &&


execution
(* javax.mail.Transport.send*(..)) &&


args
(message, ..);




public
Serializable getKey(Object message) {


return
"mail://";


}




public

String getLayer() {


return
Response.RESOURCE_SERVICE;


}

}

40

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox Framework Code

public abstract aspect

AbstractMonitor


extends AbstractMonitorClass {




before
(Object identifier) : monitorPoint(identifier) {


begin(getKey(identifier));


}




private pointcut

monitorEndAllCases() :


monitorEnd() || monitorPoint(*);



after
()
throwing

(Throwable t): monitorEndAllCases() {


Response response = responseFactory.getLastResponse();


FailureDescription description =


failureDetectionStrategy.getFailureDescription(t);


response.set(Response.FAILURE_DATA, description);


response.complete();


}



after
()
returning
: monitorEndAllCases() {



41

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Using AspectJ Load
-
Time Weaving


Advantages of LTW


Don’t need to change build processes


Provides dynamic linking as Java developers expect

avoids cyclic dependencies (aspect M refers to type X, so X
needs to be woven with M…)


Challenges (as of AspectJ 1.5.3)


LTW still uses a lot of memory


LTW can add a lot of class loading time overhead


Glassbox carefully minimizes LTW overhead


Requires “opt
-
in” inclusion in aop.xml files


Includes optimized build of AspectJ LTW

42

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Demo 2


Extending Glassbox

43

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Agenda


Glassbox Intro and Demo


Using Glassbox


Extending Glassbox


Conclusion


44

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Glassbox is Open Source

Automated Troubleshooting


Provides problem diagnosis for applications


Development: save troubleshooting time


QA: find bugs faster


Production: monitor apps and triage failures easily


Key Features


One click problem diagnosis


Provides clear descriptions with supporting evidence


Glassbox learns the app, the user doesn’t need to


Flags SLA violations, so you know which problems matter


Low overhead with minimal impact on production applications.


Enabled by noninvasive monitoring. Up in minutes on any app
with no source or build changes needed.

45

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Final summary…


Glassbox is an open source automated
troubleshooter for Java applications.


It monitors to detect performance problems and
failures


It analyzes the data to pinpoint causes


Troubleshooting at every phase is key


especially integrating new tech like AJAX and ESB


You don’t have to bake it in. Drop glassbox.war
into your app server, and it troubleshoots your
existing apps with ~1% overhead.


Glassbox provides the open source community with
an easy, automated troubleshooter,

consider adding it to your stack

46

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Code Samples and References…

Glassbox sample code, questions and
documentation:

Sample Code and zips:

http://www.glassbox.com/glassbox/News.do

Community:

http://www.glassbox.com/glassbox/Community.do

Forums:

http://www.glassbox.com/forum/forum/listforums


AspectJ References and questions at my blog:

http://rbodkin.blogs.com/

47




Thank you for coming!

We are open source and welcome
users, feedback, and help.


Please leave a business card to join
our project announcements list.


www.glassbox.com


ron.bodkin@glass
x
box.com



48

Appendix: Future Directions

We are looking for contributors!



49

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Systems Management Integration

Monitoring Platform

App Fault
Events

Multi
-
tier
analysis

Issue Self
-
Reporting

Users

Customer
Experience
Mgmt

Watch User
Traffic

Service /
Support

Our fault /
Your fault

Should I
Escalate?

IT Ops

JVM

JVM

JVM

Identify
Owner

Ops
Playbook

Trending

Eng

Policy /
Playbook
Mgmt

App
-
specific
detection

SNMP
Monitors

Intelligent
Network

50

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Future scenario

Filter by customer



SLOW OPERATION: Login


Cause:
Makes too many database queries



List of queries



Select * from users where session = ?

Select * from user_stats where userid=?

Select * from metadata where session=?

Select * from user_stats where userid=?

Select * from user_stats where userid=?

Select * from user_stats where userid=?

Select * from user_stats where userid=?

Select * from metadata where session=?

Select * from user_stats where userid=?

Select * from user_stats where userid=?



1. Customer reports problem

2. Support rep
sets filter

CustID = x

3. Customer
retries operation

4. Customer
-
specific
diagnosis

5. Transfer to
issue tracking

6. Ops/dev
gets complete
problem report

51

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Reporting


By component
-
owner


For role


Developer, Assembler, Administrator


View an operation or problem across
servers


View live operation


Alternative views (by URL, by time)



52

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Learning Normal Behavior


Performance requirements


Run the application as normal


Report deviations from this performance level


Operations


Identify different ways of rolling up operations


Roll up by URL pattern?


53

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Historical views


Telescoping views: 5 minutes, hour, day,
week, …


Performance degraded by comparison


Versus last week…


Comparison reports / what’s different


Identify a baseline and compare the current
situation against it


Between releases


After a “fix”



54

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Extended Monitoring


More out of the box technologies


JMS, Web Services, batch queues, file processing


AJAX frameworks, Web frameworks, caches, O/R


SLA’s defined in terms of


Throughput


Fairness


Extensible analysis


Rules, modules, dynamic updates…


55

© Glassbox Corporation 2005
-
2007. All Rights Reserved

Improved Clustering


Auto
-
discover configuration


Aggregate operations across cluster


Server built
-
in data


Central administration and permissions