Integrating Perl and Databases

greenbeansneedlesSoftware and s/w Development

Dec 13, 2013 (3 years and 8 months ago)

73 views

Advanced DBI

Perl Conference 3.0 Tutorial

August 1999


Tim Bunce


2

Advanced DBI
tutorial

by Tim Bunce

August 1999

Topical Topics


Wheels within Wheels


Error Checking and Handling


Transactions


Speed Speed Speed!


Portability


The Power of the Proxy


DBI for the Web


Handling LONG/BLOB data


Tainting


Flexing the Multiplex

Wheels within Wheels

The DBI architecture

and how to watch it at work

4

Advanced DBI
tutorial

by Tim Bunce

August 1999

Under the hood


DBI defines and implements an interface


Driver modules do much of the real work


DBI provides default methods, functions, tools etc for
drivers


Not limited to the lowest common denominator
-

mechanism provided for driver specific extensions


Designed and built for speed


Valuable detailed call tracing/debugging built
-
in

5

Advanced DBI
tutorial

by Tim Bunce

August 1999

DBI Module

Perl Application

DBD::
Other

DBD::Informix

DBD::Oracle

Oracle Server

Informix Server

Other

Server

A picture is worth?

6

Advanced DBI
tutorial

by Tim Bunce

August 1999

Setting the scene


Inner and outer worlds


Application and Drivers



Inner and outer handles


DBI handles are references to tied hashes



The DBI Method Dispatcher


gateway between the worlds and the heart of the DBI




Now we'll go all deep and visual for a while
...

7

Advanced DBI
tutorial

by Tim Bunce

August 1999

Architecture of the DBI classes #1

‘’outer’’

DBD::A::dr

DBD::A::db

DBD::A::st

DBI

DBI::dr

DBI::db

DBI::st

DBI handle classes visible to applications.

These classes are effectively ‘empty’.

DBD::B::dr

DBD::B::db

DBD::B::st

Parallel handle classes implemented by drivers.

DBD::_::common

DBD::_::dr

DBD::_::db

DBD::_::st

‘’inner’’

Base classes

providing

fallback

behavior.

8

Advanced DBI
tutorial

by Tim Bunce

August 1999

Architecture of the DBI classes #2

method

method

method

method

method

method

method

DBD::A::db

DBD::B::db

DBD::A::st

method

method

DBI::st








Application

makes calls

to methods

using DBI
handle

objects

method

method

method

method

method

method

DBI::db

‘’outer’’

dispatch

DBI

method

method

method

method

method

DBI::_::db

method

method

DBI::_::common

‘’inner’’

9

Advanced DBI
tutorial

by Tim Bunce

August 1999

Anatomy of a DBI handle

Handle

Ref.

DBI

Magic

‘’outer’’

‘’inner’’

struct imp_dbh_t {



struct dbih_dbc_t com;



… implementers …


… own data ...


}

struct dbih_dbc_t {



… DBI data ...


}

Tied

Hash

Hash

Tie

Magic

DBI::db

DBI::db

10

Advanced DBI
tutorial

by Tim Bunce

August 1999

Method call walk
-
through


Consider a simple prepare call:


$dbh
-
>prepare(...)


$dbh

is reference to a DBI::db object (regardless of driver)


DBI::db::prepare

is an alias for DBI dispatch method


Dispatch calls driver’s prepare method
something

like this:


my $inner_hash_ref = … # from tie magic


my $implementor_class = … # from DBI magic data


$inner_hash_ref
-
>$implementor_class::prepare(...)


Driver code gets the inner hash so it has fast access to the hash
contents

11

Advanced DBI
tutorial

by Tim Bunce

August 1999

Watching the DBI in action


DBI has detailed call tracing built
-
in


The trace can be very helpful in understanding application
behavior and for debugging


Shows parameters and results


Trace information can be written to a file


Not used often enough!

12

Advanced DBI
tutorial

by Tim Bunce

August 1999

Our program for today...


1:

#!/usr/bin/perl
-
w


2:


3:

use DBI;


4:


5:

$dbh = DBI
-
>connect('', '', '', { RaiseError => 1 });


6:


7:

$upd = $dbh
-
>prepare("UPDATE prices SET price=? WHERE prod=?");


8:

$ins = $dbh
-
>prepare("INSERT INTO prices (prod,price) VALUES(?,?)");


9:

$rows = $upd
-
>execute(42, "Widgets");

10:

$ins
-
>execute("Widgets", 42) if $rows == 0;

11:

12:

$dbh
-
>disconnect;

13

Advanced DBI
tutorial

by Tim Bunce

August 1999

Enabling Tracing


Per handle

$h
-
>trace($level);

$h
-
>trace($level, $filename);


Only effects that handle and any new child handles created from it


Child handles get trace level of parent in effect at time of creation


Global (internal to application)

DBI
-
>trace(...);


Sets effective global default
minimum

trace level


Global (external to application)


Enabled using
DBI_TRACE

environment variable

DBI_TRACE=digits



DBI
-
>trace(digits);

DBI_TRACE=filename



DBI
-
>trace(2, filename);

DBI_TRACE=digits=filename


DBI
-
>trace(digits, filename);

(extra)

14

Advanced DBI
tutorial

by Tim Bunce

August 1999

Trace level 1


Trace level 1 shows method results and line numbers:



<
-

connect= DBI::db=HASH(0xe0abc)
at DBI.pm line 356
.


<
-

STORE
('PrintError', 1)
= 1 at DBI.pm line 382.


<
-

STORE('AutoCommit', 1)= 1 at DBI.pm line 382.


<
-

STORE('RaiseError', 1)= 1 at DBI.pm line 382.



<
-

prepare('UPDATE …')= DBI::st=HASH(0xe1238) at test.pl line 7.


<
-

prepare('INSERT …')= DBI::st=HASH(0xe1504) at test.pl line 8.


<
-

execute= '0E0' at test.pl line 9.


<
-

execute= 1 at test.pl line 10.


<
-

disconnect= 1 at test.pl line 11.



<
-

DESTROY= undef


<
-

DESTROY= undef


<
-

DESTROY= undef


<
-

DESTROY= undef

15

Advanced DBI
tutorial

by Tim Bunce

August 1999

Trace level 2 and above


Trace level 2 shows calls with parameters and more:




-
>

connect for DBD::ODBC::dr (DBI::dr=HASH(0x13dfec)~0xe14a4


'' ''
****

HASH(0xe0a10))


<
-

connect=
DBI::db=HASH(0xe0ab0)

at DBI.pm line 356.



-
> STORE for DBD::ODBC::db (DBI::db=HASH(0xe0abc)~INNER 'PrintError' 1)


<
-

STORE= 1 at DBI.pm line 382.



-
> prepare for DBD::ODBC::db (
DBI::db=HASH(0xe0ab0)~0xe0abc


'UPDATE prices SET price=? WHERE prod=?')


<
-

prepare= DBI::st=HASH(0xe1274) at test.pl line 7.



Trace level 3 and above shows more processing details


Error Checking and Handling

To err is human, to detect, divine.

17

Advanced DBI
tutorial

by Tim Bunce

August 1999

The importance of error checking


Errors happen!


Failure happens when you don't expect errors!


database crash / network disconnection


lack of disk space for insert or select (sort space for order by)


server math error on select (divide by zero)


and maybe, just maybe, errors in your own code
(Gasp!)



Beat failure by expecting errors!



Detect errors early to limit effects


Defensive Programming, e.g., check assumptions


Through Programming, e.g., check for errors after fetch loops

18

Advanced DBI
tutorial

by Tim Bunce

August 1999

Error checking
-

ways and means


Error checking the hard way...


$h
-
>method || die "DBI method failed: $DBI::errstr";

$h
-
>method || die "DBI method failed: $DBI::errstr";

$h
-
>method || die "DBI method failed: $DBI::errstr";



Error checking the easy way...


$h
-
>{RaiseError} = 1;

$h
-
>method;

$h
-
>method;

$h
-
>method;

19

Advanced DBI
tutorial

by Tim Bunce

August 1999

Handling errors the smart way


For simple applications immediate death on error is fine



For more advanced applications greater control is needed



Life after death:

$h
-
>{RaiseError} = 1;

eval { … $h
-
>method; … };

if ($@) { … }



Bonus prize


Other, non
-
DBI, code within the eval block may also raise an
exception that will be caught and handled cleanly

Transactions

To do or to undo,

that is the question

21

Advanced DBI
tutorial

by Tim Bunce

August 1999

Transactions
-

What's it all about?


Far more than just locking


The A.C.I.D. test


True transactions give true safety


even from
power failures

and
system crashes
!


Incomplete transactions are automatically rolled
-
back by the
database server when it's restarted.


Also removes burden of undoing incomplete changes


Hard to implement (for the vendor) and has significant
performance cost


A very large topic worthy of an entire tutorial

22

Advanced DBI
tutorial

by Tim Bunce

August 1999

Transactions
-

Life Preserving


Classic:

system crash between one bank account being debited and
another being credited.


Dramatic:
power failure during update statement on 3 million rows
after 2 minutes when only part way through.


Real
-
world:
complex series of inter
-
related updates, deletes and
inserts on many separate tables fails at the last step due to a
duplicate primary key on an insert.



Transaction recovery would handle
all

these situations automatically.


Makes a system far more robust and trustworthy over the long term.


Use transactions if your database supports them.


If it doesn't and you
need

them, switch to a different database.

23

Advanced DBI
tutorial

by Tim Bunce

August 1999

Transactions
-

How the DBI helps


Tools of the trade:


Set

AutoCommit

off, and
RaiseError

on


Wrap
eval { … }

around the code


Use

$dbh
-
>commit;

and
$dbh
-
>rollback;


Disable AutoCommit via
$dbh
-
>{AutoCommit} = 0;


to enable transactions and thus rollback
-
on
-
error


Enable RaiseError via
$dbh
-
>{RaiseError} = 1;


to automatically 'throw an exception' on error


The surrounding
eval { … }


catches the exception and puts the error message into
$@


Test
$@

and
rollback
() if set, else
commit
()

24

Advanced DBI
tutorial

by Tim Bunce

August 1999

Transactions
-

Example code

$dbh
-
>{RaiseError} = 1;

eval {


$dbh
-
>
method(…)
; # assorted DBI calls


foo(...); # application code


$dbh
-
>commit; # commit the changes

};

if ($@) {


warn "Transaction aborted because $@";


$dbh
-
>rollback;


...

}

25

Advanced DBI
tutorial

by Tim Bunce

August 1999

Transactions
-

Further comments


The
eval { … }

catches
all

exceptions


not just from DBI calls. Also catches fatal runtime errors from Perl.


Put
commit()

inside the eval


ensures commit failure is caught cleanly.


remember that
commit
()

itself may fail for many reasons.


Don't forget that
rollback()

may also fail


due to database crash or network failure etc.


Other points:


Always explicitly commit or rollback before
disconnect
().


Destroying a connected $dbh
should

always rollback.


END blocks can catch exit
-
without
-
disconnect to rollback and
disconnect cleanly.

Speed

Speed

Speed
!

What helps,what doesn't

27

Advanced DBI
tutorial

by Tim Bunce

August 1999

Give me s
p
e
e
d
!


DBI was
designed

for speed from day one


DBI dispatch written in hand
-
crafted XS/C


Dispatch to XS driver method calls optimized


Cached attributes returned directly by DBI dispatch


DBI overhead is generally insignificant






So we'll talk about other speed issues instead ...

28

Advanced DBI
tutorial

by Tim Bunce

August 1999

Partition for speed


Application partitioning


stop and think
-

work smarter not harder


Pick the right database for the job, if you have the choice


Work close to the data


Use stored procedures where appropriate


Moving data to/from the client is always expensive


think of latency as well as bandwidth


Use proprietary bulk
-
load where appropriate


Consider database maths vs. Perl maths


Multiple simple queries with 'joins' in Perl may be faster


Consider local caching, in memory or DBM file etc.


Mix 'n Match techniques as needed

29

Advanced DBI
tutorial

by Tim Bunce

August 1999

Prepare for speed


prepare()

-

what happens in the server


Receive and parse the SQL statement into internal form


Get details for all the selected tables


Check access rights for each


Get details for all the selected fields


Check data types in expressions


Get details for all the indices of the tables


Develop an optimised query 'access plan' for best execution


Return a handle for all this cached information



This can be an expensive process
-

especially the 'access plan'


Some databases, like MySQL, don't cache the information but
have simpler, and thus faster, plan creation

30

Advanced DBI
tutorial

by Tim Bunce

August 1999

The Best Laid Plans


Query optimisation is hard


Intelligent high quality cost based query optimisation is
really

hard!


Know your optimiser


Oracle, Informix, Sybase, DB2, SQL Server etc. all slightly different.


Check what it's doing


Use tools to see the plans used for your queries
-

very helpful


Help it along


Most 'big name' databases have a mechanism to analyse and store
the key distributions of indices to help the optimiser make good plans


beware: keep it fresh, old key distributions might be worse than none


Some also allow you to embed 'hints' into the SQL as comments


beware: take it easy, over hinting hinders dynamic optimisation

(extra)

31

Advanced DBI
tutorial

by Tim Bunce

August 1999

Respect your server's SQL cache


Optimised Access Plan etc. is cached within the server


keyed by the exact original SQL string used



Without placeholders, SQL string varies each time


so cached one is not reused


and time is wasted creating a new access plan


the new statement and access plan are added to cache


so the cache fills and other statements get pushed out



Compare
do("insert … $id");


with
do("insert … ?", $id);

32

Advanced DBI
tutorial

by Tim Bunce

August 1999

Hot handles


Use
prepare()

to get a handle on the statement in the
SQL cache
and keep it there
.


The handle for the pre
-
prepared statement avoids a round
-
trip to server for cache
-
check / parsing etc.


Avoid using
$dbh
-
>do(…)

in a loop.


Use
$sth = $dbh
-
>prepare(…)


and
$sth
-
>execute()

instead.


For example, convert looped
do("insert … ?", $id)


to
prepare("insert … ?")


and looped
execute($id).

33

Advanced DBI
tutorial

by Tim Bunce

August 1999

Let the DBI cache your handles


Sometimes it's not easy to hold all your handles


e.g., library code to lookup values from the database given a
$dbh


The

prepare_cached()

method gives you a client side
statement handle cache:


while ( ($field, $value) = each %search_fields ) {


push @sql, "$field = ?";


push @values, $value;


}


$where = "";


$where = "where ".join(" and ", @sql) if @sql;


$sth = $dbh
-
>prepare_cached("select * from table $where");


$sth
-
>execute(@values);


Can avoid the need for global statement handle variables

34

Advanced DBI
tutorial

by Tim Bunce

August 1999

Sling less for speed


while(@row = $sth
-
>fetchrow_array) { }


one field: 3,100 fetches per cpu second


ten fields: 1,000 fetches per cpu second



while($row = $sth
-
>fetchrow_arrayref) { }


one field: 5,300 fetches per cpu second


ten fields: 4,000 fetches per cpu second



Notes:


Timings made on an old SPARC 10 using DBD::Oracle


Timings assume instant record fetch within driver


Fields all one char.
@row

would be
even slower

for more/bigger fields


Use
bind_columns()

for direct

access to fetched fields

35

Advanced DBI
tutorial

by Tim Bunce

August 1999

Keep a handle on your databases


Connecting to a database can be slow


Try to connect once and stay connected where practical


We'll discuss web server issues later



New method:


The new

connect_cached()

method acts like
prepare_cached()

but for database handles


It also checks the connection and reconnects if it's broken


Similar, but not quite the same as Apache::DBI


New and subject to change with experience


Potentially most useful with DBD::Proxy & DBI::ProxyServer

Portability

A Holy Grail

To be taken with a pinch of salt

37

Advanced DBI
tutorial

by Tim Bunce

August 1999

Portability in practice


Portability requires care and testing
-

it can be tricky


Platform Portability
-

the easier bit


Availability of database client software (and server if required)


Availability of DBD driver


DBD::Proxy can address both these issues
-

see later


Database Portability
-

more tricky


Differences in SQL dialects cause most problems


Differences in data types can also be a problem


Driver capabilities (placeholders etc)


DBIx::Compat module (in DBIx::RecordSet) may be useful.


A standard DBI test suite is needed (and planned).

The Power of the Proxy

Thin clients ...

and other buzz words

39

Advanced DBI
tutorial

by Tim Bunce

August 1999

DBD::Proxy & DBI::ProxyServer


Networking for Non
-
networked databases


DBD::Proxy driver forwards calls over network to remote
DBI::ProxyServer


No changes in application behavior


Only the
DBI
-
>connect

statement needs to be changed


Proxy can be made completely transparent by setting the
DBI_AUTOPROXY

environment variable


So not even the
DBI
-
>connect

statement needs to be changed!


DBI::ProxyServer works on Win32


Access to Access and other Win32 ODBC and ADO data sources


Developed by Jochen Wiedmann

40

Advanced DBI
tutorial

by Tim Bunce

August 1999

A proxy picture

Application

DBI

RPC::pClient

Storable

IO::Socket

DBD::Foo

DBI

DBI::ProxyServer

RPC::pServer

IO:Socket

Storable

Network

DBD::Proxy

41

Advanced DBI
tutorial

by Tim Bunce

August 1999

Thin clients and other buzz words


Proxying for remote access: "thin
-
client"


No need for database client code on the DBI client


Proxying for network security: "encryption"


Can use Crypt::IDEA, Crypt::DES etc.


Proxying for "access control" and "firewalls"


handy for web servers


Proxying for action control


e.g., only allow specific select or insert statements


Proxying for local access: "n
-
tier middleware"



connect_cached()
, memory use, licensing limits


Proxying for performance: "compression"


Can compress data transfers using Compress::Zlib


DBI for the Web

Hand waving from 30,000 feet

43

Advanced DBI
tutorial

by Tim Bunce

August 1999

Web DBI
-

Connect speed


Databases can be slow to connect


Traditional CGI
forces

a new connect per request


Move Perl and DBI into the web server


Apache with mod_perl and Apache::DBI module


Microsoft IIS with ActiveState's PerlEx


Connections can then persist between requests


Apache::DBI automatically used by DBI if loaded


No CGI script changes required to get persistence


Take care not to change the shared session behaviour


Leave the
$dbh

in the same state you found it!


Other alternatives include


FastCGI, CGI::SpeedyCGI and CGI::MiniSvr

44

Advanced DBI
tutorial

by Tim Bunce

August 1999

Web DBI
-

Too many connections


Limits on Database Connections


Database platform resources or licensing


Memory consumption of web servers



Partition web servers into General and Database



Direct database work to the Database web servers


Use Reverse Proxy / Redirect / Rewrite to achieve this


Allows each subset of servers to be tuned to best fit workload


And be run on appropriate platforms

45

Advanced DBI
tutorial

by Tim Bunce

August 1999

Web DBI
-

State
-
less
-
ness


No fixed client
-
server pair


Each request can be handled by a different process.


So can't simply stop fetching when one page is complete and continue
fetching from the same
$sth

when the next page is requested.


And transactions can't span requests. Even if they could you'd have
problems with database locks being held etc.


Need access to 'accumulated state' somehow:


via the client (e.g., hidden form fields)


via the database (records in a session_state table)


via the web server file system (DBM files etc)


Need to purge old state info if stored on server, so timestamp it


See Apache::Session module


DBI::ProxyServer + connect_cached with session ID may suit

46

Advanced DBI
tutorial

by Tim Bunce

August 1999

Web DBI
-

Browsing pages of results


Re
-
execute query each time then count/discard (simple but expensive)


works well for small results sets or where users rarely view many pages


fast initial response, degrades gradually for later pages


count/discard in server is better but still inefficient for large result sets


affected by inserts and deletes from other processes


Re
-
execute query with where clause using min/max keys from last results


works well where original query can be qualified in that way
, not common


Select and cache full result rows somewhere for fast access


can be expensive for large result sets with big fields


Select and cache only the row keys, fetch full rows as needed


optimisation of above, use ROWID if supported, "select … where … in (…)"


If data is static and queries predictable then custom pre
-
built indexes may be
useful


The cache can be on web server, using DBM file for example, or on database
server using a table keyed by session id

47

Advanced DBI
tutorial

by Tim Bunce

August 1999

Web DBI
-

Concurrent editing


How to prevent updates overwriting each other?


You can use Optimistic Locking via 'qualified update':

update table set ...=...

where key = $old_key

and field1 = $old_field1

and field2 = $old_field2 and …
for all fields


Potential problems with floating point data values not matching


Some databases support a high
-
resolution 'update timestamp'
field that can be checked instead


Check the update row count


If it's zero then you know the record has been changed or deleted
by another process

48

Advanced DBI
tutorial

by Tim Bunce

August 1999

Web DBI
-

Tips for the novice


Test one step at a time


Test perl + DBI + DBD driver outside the web server first


Test web server + non
-
DBI CGI next


Remember that CGI scripts run as a different user with a different
environment
-

expect to be tripped up by that


DBI trace is your friend
-

use it


Use the Perl "
-
w" and "
-
T" options. Always "use strict;"


Read and inwardly digest the WWW Security FAQ:



http://www
-
genome.wi.mit.edu/WWW/faqs/www
-
security
-
faq.html


Read the Perl CGI FAQ:


http://www.perl.com/cgi
-
bin/pace/pub/doc/FAQs/cgi/perl
-
cgi
-
faq.html


And, if appropriate, read the mod_perl information available from:


http://perl.apache.org

49

Advanced DBI
tutorial

by Tim Bunce

August 1999

Handling LONG/BLOB data


What makes LONG / BLOB / MEMO data special?


Not practical to pre
-
allocate fixed size buffers for worst case



Fetching LONGs
-

treat as normal fields after setting:


$dbh
-
>{LongReadLen}

-

buffer size to allocate for expected data


$dbh
-
>{LongTruncOk}

-

should truncating
-
to
-
fit be allowed



Inserting LONGs


The limitations of string literals


The benefits of placeholders



Chunking / Piecewise processing not supported


So you're limited to available memory


Some drivers support an unofficial
blob_read()

method

50

Advanced DBI
tutorial

by Tim Bunce

August 1999

DBI security tainting


By default the DBI ignores Perl tainting


doesn't taint returned data


doesn't check that parameters are not tainted



The Taint attribute enables that behaviour


If Perl itself is in taint mode



Each handle has it's own inherited Taint attribute


So can be enabled for a connection and disabled for particular
statements, for example:


$dbh = DBI
-
>connect(…, { Taint => 1 });

$sth = $dbh
-
>prepare("select * from safe_table");

$sth
-
>{Taint} = 0; # no tainting on this handle

Flexing the Multiplex

A new kid on the block

52

Advanced DBI
tutorial

by Tim Bunce

August 1999

DBD::Multiplex


DBD::Multiplex
-

new and experimental


Connects to multiple databases at once (via DBI)


Executes statements on all of them by default


Could be configured to:


insert

into all databases but
select

from one


fallback to alternate database if primary is unavailable


select

from all and check results (pick most common)


select

round
-
robin / or at random to distribute load


Can be used with DBD::Proxy, either above or below


May also acquire fancy caching in later versions


Watch this space!

53

Advanced DBI
tutorial

by Tim Bunce

August 1999

Reference Materials


http://www.symbolstone.org/technology/perl/DBI


The DBI Home Page


http://www.isc.org/dbi
-
lists.html


To subscribe to the DBI Mailing Lists


http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk1_1997.tar.gz


My DBI session at The Perl Conference 1.0
-

general introduction


http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk2_1998.tar.gz


My DBI session at The Perl Conference 2.0
-

more depth


http://www.perl.com/CPAN/authors/id/TIMB/DBI_Talk3_1999.tar.gz


This DBI tutorial at The Perl Conference 3.0!


http://www.oreilly.com/catalog/perldbi/


or
http://www.amazon.com/exec/obidos/ASIN/1565926994/dbi


The forthcoming DBI book! Due late 1999 or early 2000.


(Order in advance at the conference to get special discounted rate :
-
)

The end.


Till next year...