Take my logs Please

AMInternet and Web Development

Nov 2, 2011 (5 years and 9 months ago)

803 views

Mike Brittain Director of Engineering, Infrastructure Etsy.com mike@etsy.com @mikebrittain

Take my logs. Please.
Mike Brittain
Director of Engineering, Infrastructure
Etsy.com
mike@etsy.com
@mikebrittain
(hello?)
This sounds
boooooorrrrring...
No, no... hang in there!
25 MM uniques/month
150 Countries
$300 MM+ sales last year
Apache
, PHP, MySQL,
PostgreSQL,
Memcache,
Gearman
,
Solr, etc.
What’s working?
What’s working?
Performance
What’s working?
Performance
Operability
What’s working?
Performance
Operability
Simplicity
Logging
+
Trending
App logging
(Apache access and error logs)
LogFormat "%h %l %u %t
\"%r\" %>s %b
“Common”
LogFormat "%h %l %u %t
\"%r\" %>s %b
\"%{Referer}i\"
\"%{User-agent}i\""
“Combined”
mod_log_config
%f
Filename requested
%k
# of keepalive requests served
on this connection
%T
Time taken to serve the request,
in seconds
%f
Filename requested
%k
# of keepalive requests served
on this connection
%D
Time taken to serve the request,
in microseconds
mod_log_config
%f
Filename requested
%k
# of keepalive requests served
on this connection
%D
Time taken to serve the request,
in microseconds
%{foobar}n
Contents of “note” foobar from
another module
mod_log_config
apache_note(“foobar”, $whatever);
apache_note()
LogFormat %{True-Client-IP}i %l %t \"%r\"
%>s %b \"%{Referer}i\"
\"%{User-Agent}i\" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n %{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
“Steroids”
LogFormat %{True-Client-IP}i %l %t \"%r\"
%>s %b \"%{Referer}i\"
\"%{User-Agent}i\" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n %{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n
%D
“Steroids”
$GLOBALS['timer'] =
microtime
(true) * 1000000;
$GLOBALS['timer'] = microtime(true) * 1000000;
register_shutdown_function
('pageStats');
function pageStats() {
}
$GLOBALS['timer'] = microtime(true) * 1000000;
register_shutdown_function('pageStats');
function pageStats() {
$timer_end = microtime(true) * 1000000;
$diff = $timer_end - $GLOBALS['timer'];
}
$GLOBALS['timer'] = microtime(true) * 1000000;
register_shutdown_function('pageStats');
function pageStats() {
$timer_end = microtime(true) * 1000000;
$diff = $timer_end - $GLOBALS['timer'];

apache_note
('php_microsec', $diff);

apache_note
('php_bytes',
memory_get_peak_usage());
}
What about “
%D
”?
LogFormat %{True-Client-IP}i %l %t \"%r\"
%>s %b \"%{Referer}i\"
\"%{User-Agent}i\" %V
%{user_id}n
%{shop_id}n %{uaid}n
%{ab_selections}n %{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
“Steroids”
LogFormat %{True-Client-IP}i %l %t \"%r\"
%>s %b \"%{Referer}i\"
\"%{User-Agent}i\" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n
%{request_uid}n

%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
“Steroids”
LogFormat %{True-Client-IP}i %l %t \"%r\"
%>s %b \"%{Referer}i\"
\"%{User-Agent}i\" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n
%{request_uid}n
%{api_consumer_key}n
%{api_method_name}n
%{php_bytes}n %{php_microsec}n %D
“Steroids”
LogFormat %{True-Client-IP}i %l %t \"%r\"
%>s %b \"%{Referer}i\"
\"%{User-Agent}i\" %V
%{user_id}n %{shop_id}n %{uaid}n
%{ab_selections}n
...
easy_reg=1; personalize_widget=0;
icon_in_cornflower_blue=1;
“Steroids”
Coming soon...
%{locale}n

(i18n)
%{platform}n

(desktop vs. mobile)
%{locale}n

(i18n)
%{platform}n

(desktop vs. mobile)
OPS-1805, OPS-1827
etsy.com/careers
Coming soon...
Using something else?
time, http method, request uri,
response code, referer, user-agent,
response time, response memory,
custom segmentation fields...
Quick averages
grep "GET /listing/" access.log | \
awk '{sum=sum+$(NF-1)} END {print sum/NR}'
Quick graphs
grep "GET /listing/" access.log | \
perl -pe "s/.*\[.*\d{4}:(\d{2}):(\d{2}):\d{2}.*\]/\1:\2/" | \
awk '{print $1, $(NF-1)}' > /tmp/pagetimes.dat
gives you...
Quick graphs
# /tmp/pagetimes.dat
18:37 251.0
18:38 252.1
18:39 253.5
18:40 251.0
18:45 250.0
and then...
Quick graphs
# GNUPLOT
set terminal png
set output 'listings.png'
set yrange [0:2000]
set xdata time
set timefmt "%d/%B/%Y:%H:%M:%S"
set format x "%H:%M"
plot '/tmp/pagetimes.dat' using 1:2 with points
Quick graphs
Error logs
PHP + Apache errors in one file
Simple logging interface
Error logs
Levels: error, info, debug
Namespace: perf, sql, __class__
Logger::
error
("Query exceeded 5 sec: $query",

sql_long_query
”);
web0054 [Fri Mar 04 16:27:48 2011] [
error
]
[
sql_long_query
] [mk04gw1p71] Query exceeded
5 sec: SELECT * FROM ...
web0054 [
Fri Mar 04 16:27:48 2011
] [error]
[sql_long_query] [mk04gw1p71] Query exceeded
5 sec: SELECT * FROM ...
$ grep "
16:27:48
" access.log | wc -l
1527
web0054 [Fri Mar 04 16:27:48 2011] [error]
[sql_long_query] [
mk04gw1p71
] Query exceeded
5 sec: SELECT * FROM ...
iow
error.log -> request_uid -> access.log
request uri, ab selections, user id, locale,
platform, api key, etc.
Filtering
tail -f error.log |
grep -v
“sql_long_query” | ...
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!
web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!
web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!
web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!
web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.
web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.
web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.
web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!
web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.
web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!
web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!
web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!
web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo
web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue.
web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling
web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.
web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!
web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo
web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!
web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.
Trending
fatals errors warnings
Logster
Run by cron
Maintains a cursor on log files
Simple parsing & aggregation
Output to Ganglia or Graphite
github.com/etsy
web0054 [Fri Mar 04 16:27:48 2011] [
error
]
[login] [mk04gw1p71] User login failed.
Reason: wrong password for ...
^.+ \[.+\] \[(
?P<log_level>
.+)\]
if (fields['
log_level
'] == “
fatal
”):
self.fatals += 1
elif (fields['
log_level
'] == “
error
”):
self.errors += 1
elif (fields['
log_level
'] == “
warning
”):
self.warnings += 1
...
MetricObject("
fatals
",
(
self.fatals
/ self.duration), "per sec")
MetricObject("
errors
",
(
self.errors
/ self.duration), "per sec")
MetricObject("
warning
",
(
self.warnings
/ self.duration), "per sec")
fatals errors warnings
Logster
Signed-in
vs.
Signed-out
github.com/etsy
Log
a

plethora
of data.
Don’t be afraid to use one file.
Use
custom fields
to segment data.
Correlate
errors to specific requests.
Make f#@k!ng
graphs.
Convert rates to
trend lines.
Take my logs.
Please!
Mike Brittain
Director of Engineering, Infrastructure
Etsy.com
mike@etsy.com
@mikebrittain
code
as
craft.etsy.com
github.com/etsy
Thank you.