AM403 - Bioinformatics 1 - Bioinf.org.uk

clumpfrustratedBiotechnology

Oct 2, 2013 (3 years and 9 months ago)

73 views

Perl/DBI
-

accessing
databases from Perl

Dr. Andrew C.R. Martin

martin@biochem.ucl.ac.uk

http://www.bioinf.org.uk/

Aims and objectives


Understand
the need to access
databases from Perl


Know
why DBI?


Understand the structure of
DBI


Be able to
write a Perl/DBI script

to
read from or write to a database


PRACTICAL
: write a script to read from
a database

Why access a database
from Perl?

CGI

Script

Why access a database
from Perl?

Send request for page

to web server

Pages

External

Programs

RDBMS

Web browser

Web server

CGI can extract parameters sent with the page request

Why access a database
from Perl?

Populating databases


Need to pre
-
process and re
-
format data
into SQL


Intermediate storage during data
processing

Reading databases


Good database design can lead to
complex queries (wrappers)


Need to extract data to process it

Why use DBI?

Why Perl/DBI?

Many relational databases available.


Commercial

examples:


Oracle


DB/2


SQLServer


Sybase


Informix


Interbase


Open source

examples:


PostgreSQL


mySQL

Why Perl/DBI?


Databases use a
common query
language
:


‘structured query language’

(SQL)



Queries can easily be ported between
different database software


Minor variations in more advanced features


Proprietary extensions

Why Perl/DBI?


Can call command
-
line interface from
within your program.



$result = `psql
-
tqc “SELECT * FROM table” `;


@tuples = split(/
\
n/, $result);

foreach $tuple (@tuples)

{


@fields = split(/
\
|/, $tuple);

}

Inefficient
: new process

for each database access

Why Perl/DBI?


Databases generally provide
own APIs

to allow access from programming
languages


e.g. C, Java, Perl


Proprietary APIs
all differ


Very
difficult to port

software
between databases


Standardized APIs

have thus become
available

Why Perl/DBI?


Perl/DBI is the
standardized API

for Perl


Easy to port Perl scripts from one database
to another



DBI and ODBC


ODBC

(Open DataBase Connectivity)


Consortium of vendors in early 1990s


SQL Access Group


October 1992 & 1993,
draft standard
:


‘Call Level Interface’ (‘CLI’
-

an API)


Never really adopted


Microsoft

‘embraced and extended’

it
to create ODBC


DBI and ODBC



DBPerl
’ designed as database interface
specifically for Perl4


September 1992 (i.e.
pre
-
ODBC
)


Just before release, Perl5 announced
with OO facilities


DBPerl modified to support OO and
loosely modelled on
CLI

standard


This became
DBI


DBI and ODBC







Standard SQL syntax

Dodged this issue!

Standard error codes

Check $DBI::err or

$DBI::errstr (DBI

provides methods for

standard errors, but

drivers don’t use them)

Meta
-
data on

database structure

Tables and types

only

Many attributes and

options to tweak

underlying driver

Very limited control

ODBC

DBI

DBI and ODBC


There is an
ODBC driver

for DBI


DBI can be used to access any ODBC
database

The structure of DBI

DBI Architecture

Oracle

driver

mySQL

driver

PostgreSQL

driver

Sybase

driver

DBI

DBI Architecture


Multi
-
layer design

Perl script

DBI

DBD

Database API

RDBMS

Database

Independent

Database

Dependent

DBD::Oracle

DBD::Oracle

DBD::Oracle

Perl

Script

Perl

Script

DBI Architecture

Perl

Script

DBI

DBD::mysql

DBD::Oracle

DBD::Pg

Oracle

mySQL

PostgreSQL

DBI Architecture

Returns a list of installed (DBD) drivers

@drivers = DBI
-
>available_drivers();

DBI Architecture

DBD::Oracle

Driver Handle

DBD::pg

Driver Handle

Database

Handle

Database

Handle

Database

Handle

Statement

Handle

Statement

Handle

Statement

Handle

Statement

Handle

Statement

Handle

Statement

Handle

Statement

Handle

DBI Architecture

Driver Handles



References
loaded driver
(s)


One per driver



Not normally referenced in programs


Standard variable name
$drh

DBI Architecture

Database Handles


Created by connecting to a database




References a database

via a driver


Can have
many per database

(e.g.
accessing different user accounts)


Can access
multiple databases


Standard variable name
$dbh

$dbh = DBI
-
>connect($datasource, ... );

DBI Architecture

Statement Handles


Created by ‘preparing’ some SQL




Can have
many per database handle

(e.g. multiple queries)


Standard variable name
$sth


$sth = $dbh
-
>prepare($sql);

DBI Architecture

DBD::pg

Driver Handle

Database

Handle

Statement

Handle

$dbh = DBI
-
>connect($datasource, ... );

$sth = $dbh
-
>prepare($sql);

SQL Preparation

prepare()

Perl script

DBI and Driver

Database

Pass statement

to database engine

$sth

Encapsulate as DBI

statement handle

Parse

Statement

Encapsulate

Statement

if valid

Pass statement

to database engine

execute()

fetchrow_array()

Maintain cursor

into results

Execute

Statement

Writing Perl/DBI scripts

Accessing DBI from Perl


Must have the
DBI package

and appropriate
DBD package

installed


DBD::Oracle, DBD::Pg, DBD::mysql, etc.

use DBI;

$dbh = DBI
-
>connect($datasource, $username, $password);


Create a ‘handle’ to access the database:

The DBD module

and d/b to be used

Data sources


$dbname = “mydatabase”;

$dbserver = “dbserver.cryst.bbk.ac.uk”;

$dbport = 5432;


$datasource = “dbi:Oracle:$dbname”;

$datasource = “dbi:mysql:database=$dbname;host=$dbserver”;

$datasource = “dbi:Pg:dbname=$dbname;host=$dbserver;port=$dbport”;


Format varies with

database module

$dbh = DBI
-
>connect($datasource, $username, $password);

Optional; supported by

some databases.


Default: local machine

and default port.

Username and password


$username

and
$password

also optional


Only needed if you normally need a
username/password to connect to the
database.


Remember CGI scripts run as a
special web
-
server user
.


Generally, ‘nobody’ or ‘apache’.


Database must allow access by this user


or specify a different username/password

$dbh = DBI
-
>connect($datasource, $username, $password);

SQL commands with no
return value


SQL commands other that SELECT don’t
return values


may return success/failure flag


number of entries in the database affected



For example:


creating a table


inserting a row


modifying a row

SQL commands with no
return value


From Perl/DBI:

INSERT INTO idac VALUES (‘LYC_CHICK’, ‘P00698’)


e.g. insert a row into a table:

$sql = “INSERT INTO idac VALUES (‘LYC_CHICK’, ‘P00698’)”;

$dbh
-
>do($sql);

SQL commands that return
a single row


Sometimes, can
guarantee

that a database
query will return only one row


or you are only interested in the first row

$sql = “SELECT * FROM idac WHERE ac = ‘P00698’”;

@values = $dbh
-
>selectrow_array($sql);


Columns

placed in an array


Could also have been placed in a list:

($id, $ac) = $dbh
-
>selectrow_array($sql);

SQL commands that return
a multiple rows


Most SELECT statements will return many rows



Three stages must be performed:


preparing

the SQL


executing

it


extracting

the results

SQL commands that return
a multiple rows

(Can also obtain array or hash reference)

NB: statement handle /
fetchrow_array


rather than db handle /
selectrow_array

$sql = “SELECT * FROM idac”;

$sth = $dbh
-
>prepare($sql);

if($sth
-
>execute)

{


while(($id, $ac) = $sth
-
>fetchrow_array)


{


print “ID: $id AC: $ac
\
n”;


}

}

SQL commands that return
a multiple rows


If you need to stop early you can do:

$sql = “SELECT * FROM idac”;

$sth = $dbh
-
>prepare($sql);

if($sth
-
>execute)

{


for($i=0; $i<10; $i++)


{


if(($id, $ac) = $sth
-
>fetchrow_array)


{


print “ID: $id AC: $ac
\
n”;


}


}


$sth
-
>finish;

}

SQL commands that return
a multiple rows


A utility method is also available to print a
complete result set:

$sql = “SELECT * FROM idac”;

$sth = $dbh
-
>prepare($sql);

if($sth
-
>execute)

{


$nrows = $sth
-
>dump_results;

}

(Mostly useful for debugging)

Repeated SQL calls


Often want to repeat essentially the
same
query
, but with some
different value

being
checked.


For example:

foreach $ac (‘P00698’, ‘P00703’)

{


$sql = “SELECT * FROM idac WHERE ac = ‘$ac’”;


@values = $dbh
-
>selectrow_array($sql);


print “@values
\
n”;

}

(using special option for 1
-
row returns)

Repeated SQL calls


Could also be do:

foreach $ac (‘P00698’, ‘P00703’)

{


$sql = “SELECT * FROM idac WHERE ac = ‘$ac’”;


$sth = $dbh
-
>prepare($sql);


$sth
-
>execute;


while(@values = $sth
-
>fetchrow_array)


{


print “@values
\
n”;


}

}

i.e. don’t use special option for 1
-
row returns


$dbh
-
>selectrow_array($sql)

Repeated SQL calls


Increase in performance by ‘binding’ a variable:

$sql = “SELECT * FROM idac WHERE ac = ?”;

$sth = $dbh
-
>prepare($sql);


foreach $ac (‘P00698’, ‘P00703’)

{


$sth
-
>bind_param(1, $ac);


$sth
-
>execute;


while(@values = $sth
-
>fetchrow_array)


{


print “@values
\
n”;


}

}

Parameter

Number

Variable

to bind

Repeated SQL calls

NOTE:


Performance increase
depends

on database
and driver


Although strings normally enclosed in single
inverted commas, the
bound variable is not
quoted
.


If you have a number which you need to be
treated as a string, then you do:



$sth
-
>bind_param(1, 42, SQL_VARCHAR);

Summary


DBI provides a
standard API


It does not standardize the
SQL


DBI is an
older

standard than ODBC


They can be used together and they are
both
evolving


Basic
3
-
step

process:


prepare / execute / fetch


Shortcut

calls for no return or 1
-
row return


Many other functions

available