PostgreSQL Python

decubitisannouncerData Management

Nov 27, 2012 (4 years and 8 months ago)

214 views

Putting Python in
PostgreSQL
Frank Wiles
frankrevsys.com
 fwiles  http://www.revsys.com
Tuesday, March 13, 12
Today we get to
combine two of my
favorite things....
Tuesday, March 13, 12
Why on earth would
you want to do that?

Build in Python rather than banging your
head against PL/PGSQL

Utilize PyPi packages

Use a network connection (ZMQ anyone?)

Retrofit features on a large/complex/
proprietary system
Tuesday, March 13, 12
Getting Started
Tuesday, March 13, 12
Installing pl/python

Ubuntu -
aptitude install
postgresql-plpython-<version>

Mac OS X Homebrew -
PYTHON = /
path/to/python brew install
postgresql
Tuesday, March 13, 12
Setting up the database

createdb <database name>

createlang plpythonu
<databasename>

Check with:
SELECT * FROM
pg_language
Tuesday, March 13, 12
Writing your first function
CREATE OR REPLACE FUNCTION
pymulti(a integer, b integer)
returns integer AS $$
return a * b
$$ LANGUAGE plpythonu;
Tuesday, March 13, 12
Writing your first function
CREATE OR REPLACE FUNCTION
pymulti(a integer, b integer)
returns integer AS $$

return a * b
$$ LANGUAGE plpythonu;
Tuesday, March 13, 12
Datatypes
PostgreSQL
Python
small int and int
int
bigint
long
boolean
bool
All text types
str
SQL Array
list
Custom Types
dict
Tuesday, March 13, 12
Debugging

Using
print
doesn’t work all that well.

You can log directly to PostgreSQL’s log file
with:
plpy.notice(“<msg>”)
plpy.debug(“<msg>”)
plpy.error(“<msg>”)
plpy.fatal(“<msg>”)
Tuesday, March 13, 12
Or log with Python...
import logging
logger =
logging.BasicConfig(
filename=’/tmp/plpy.log’
level=logging.INFO
)
logging.info(“Hi there!”)
Tuesday, March 13, 12
DANGER WILL
ROBINSON!

Kind of a pain to maintain and debug

Can easily confuse your DBA (if that isn’t
you)

Not exactly slow, but definitely not free

Use with caution: This should be in your back
of tricks if you need it, but use sparingly

Requires superuser privs and no virtualenvs
Tuesday, March 13, 12
When should you use
it?

Rolling up/aggregating data

Enforce constraints across teams/devs/
languages

Protect data integrity from ad hoc queries

Change/Add features at the DB level when
you can’t at the app level
Tuesday, March 13, 12
Triggers!
Tuesday, March 13, 12
Let’s say we are setup
like this...
CREATE TABLE trigger_test (
id serial,
username varchar,
is_active boolean default true,
balance int4
);
INSERT INTO trigger_test (username, is_active,
balance) values ('frankwiles', 't', 1000);
INSERT INTO trigger_test (username, is_active,
balance) values ('jacobkaplanmoss', 'f', 50);
INSERT INTO trigger_test (username, is_active,
balance) values (‘jefftriplett', 't', 10);
Tuesday, March 13, 12
Let’s ensure you can’t change an inactive
user’s balance
CREATE OR REPLACE FUNCTION
check_active()
returns trigger AS $$
class NotActive(Exception): pass
if not TD["old"]["is_active"]:
raise NotActive
$$ LANGUAGE plpythonu;
Tuesday, March 13, 12
If updating active and balance...
CREATE OR REPLACE FUNCTION
check_active()
returns trigger AS $$
class NotActive(Exception): pass
if not TD["old"]["is_active"]
and not TD["new"]["is_active"]:
raise NotActive
$$ LANGUAGE plpythonu;
Tuesday, March 13, 12
How to hook a trigger
to a table
CREATE TRIGGER double_check_active
BEFORE UPDATE ON trigger_test
FOR EACH ROW EXECUTE PROCEDURE
check_active();
Tuesday, March 13, 12
What happens?
pycon=# update trigger_test set balance = 60
where username=‘jacobkaplanmoss’;
ERROR: NotActive:
CONTEXT: Traceback (most recent call last):
PL/Python function "check_active", line 5,
in <module>
raise NotActive
PL/Python function "check_active"
pycon=#
Tuesday, March 13, 12
Info in TD
TD[“event”]
INSERT, UPDATE,
DELETE, TRUNCATE
TD[“when”]
BEFORE, AFTER, or
INSTEAD OF
TD[“level”]
ROW or STATEMENT
TD[“new”]/TD[“old”]
New and Old data
TD[“name”]
Name of trigger
TD[“table_name”]
Table trigger called on
TD[“table_schema”]
Schema table is in
TD[“args”]
Arguments to trigger
function
Tuesday, March 13, 12
CREATE TRIGGER options
CREATE TRIGGER
name

{BEFORE|AFTER} {
event
[ OR ... ] }
ON
table

[ FOR [EACH] {ROW |STATEMENT} ]
[ WHEN (
condition
) ]
EXECUTE PROCEDURE
function_name
(
arguments
)
Tuesday, March 13, 12
Checking credit card numbers
CREATE OR REPLACE FUNCTION
checkcc(ccnum varchar) returns boolean AS $$
sum = 0
num_digits = len(ccnum)
for count in range(0, num_digits):
digit = int(card_number[count])
if not (( count & 1 ) ^ oddeven ):
digit = digit * 2
if digit > 9:
digit = digit - 9
sum = sum + digit
if sum & 10 == 0:
return True
else:
return False

$$ LANGUAGE plpythonu;
Tuesday, March 13, 12
Let’s do something more
interesting...
CREATE TABLE msgs (
id serial,
to_user varchar,
from_user varchar,
read boolean default false,
message text
);
Tuesday, March 13, 12
CREATE OR REPLACE FUNCTION handle_insert()
RETURNS trigger AS $$
import redis

to_user = TD["new"]["to_user"]
unread_key = "unread-%s" % to_user
r = redis.Redis()
r.incr(unread_count_key)

$$ LANGUAGE plpythonu;
For INSERTs...
Tuesday, March 13, 12
CREATE OR REPLACE FUNCTION handle_update()
RETURNS trigger AS $$
import redis
to_user = TD["new"]["to_user"]
unread_key = "unread-%s" % to_user
r = redis.Redis()
if not TD["old"]["read"] and
TD["new"]["read"]:
r.decr(unread_count_key)
else:
r.incr(unread_count_key)
$$ LANGUAGE plpythonu;
For UPDATEs...
Tuesday, March 13, 12
CREATE OR REPLACE FUNCTION handle_delete()
RETURNS trigger AS $$
import redis
to_user = TD["new"]["to_user"]
unread_key = "unread-%s" % to_user
r = redis.Redis()
r.decr(unread_count_key)

$$ LANGUAGE plpythonu;
For DELETEs...
Tuesday, March 13, 12
Attach them to the table
CREATE TRIGGER counter_insert AFTER INSERT
ON msgs FOR
EACH ROW EXECUTE PROCEDURE
handle_insert();
CREATE TRIGGER counter_update AFTER
UPDATE ON msgs FOR
EACH ROW EXECUTE PROCEDURE
handle_update();
CREATE TRIGGER counter_delete AFTER
DELETE ON msgs FOR
EACH ROW EXECUTE PROCEDURE
handle_delete();
Tuesday, March 13, 12
What other things can
you do?
Tuesday, March 13, 12
Full database access!
CREATE OR REPLACE FUNCTION
db_count() RETURNS trigger AS $$

if TD[“new”][“active”] == False:
plpy.execute(
“INSERT INTO finished_tickets
(ticket_id) VALUES (%d)” % TD[“new”]
[“id”]

$$ LANGUAGE plpythonu;
Tuesday, March 13, 12
Internal ideas...

Aggregate/rollup data

Push “noisy” calculations into the DB to
avoid round trips

Build and maintain materialized views

Regenerate expensive reports only when
the underlying data changes enough to
warrant it
Tuesday, March 13, 12
More external ideas...

Send emails based on data additions/changes

Expire/repopulate caches

Fire off celery tasks

Trigger backups based on % of data changed

Hit external APIs based on conditions

Lots of fun to be had with ZeroMQ...
Tuesday, March 13, 12
Retrofitting an existing
system

Send email alerts

Populate your system with data based on
INSERTs in another

Want an SMS when your boss adds a ticket
to Trac, but don’t want to dive into the
code?
Tuesday, March 13, 12
Questions?
frankrevsys.com
 fwiles 
http://www.revsys.com
Tuesday, March 13, 12