Finding and Reporting Postgres Bug #7553 - SEAPUG

candlewhynotΔιαχείριση Δεδομένων

31 Ιαν 2013 (πριν από 4 χρόνια και 8 μήνες)

137 εμφανίσεις

By: Lloyd Albin

9/28/2012

Restructure the database by moving tables, views, and
functions to new schemas and sometimes renaming
those tables, views, or functions at the same time. Some
tables can’t be moved so you need to create views in the
new schema to reference the old tables and then update
any view to reference the new view instead of the table.


The first task is to move the tables, views and functions
to there new locations and rename them is necessary.
We must rename them first to keep from having two
items with the same name in the new schemas. This is
done by the following simple Postgres commands.


ALTER TABLE
schema.table

RENAME TO
new_table
;

ALTER TABLE
schema.table

SET SCHEMA
new_schema
;

Some tables can’t be moved so you need to create views
in the new schema to reference the old
tables.


CREATE OR REPLACE VIEW
new_schema.new_view

AS

SELECT * FROM
old_schema.old_table
;

This is the hard part. We need to write a function to
update existing views to reference the new views instead
of the old tables.


To do this part I wrote a
plpgsql

function that reads the
view
definition from
information_schema.views

and
modfies

by using
regexp_replace

to update the references. At the
same time I added to the start of the view the ‘
CREATE
OR REPLACE VIEW
schema.view

AS
’ and then
EXECUTE

the
new view thereby saving it.

I found that some views during the
EXECUTE

were
failing. So figuring the problem must be within the
function that I just wrote, time to start debugging my
function.


To make debugging easier, I added two lines to show me
the DDL for the old and new views.


RAISE NOTICE ‘Old View
\
n%

,
old_view
;

RAISE NOTICE ‘New
View
\
n%’,
new_view
;

What I found was that the function was changing the
code properly between the old and new views.


FROM
old_schema.old_table

to

FROM
new_schema.new_view


old_table.field

to

new_view.field

I tried to execute the new
SELECT

statement and it also
failed with error messages.


In trying to figure out the problem, I ended up trying to
execute the old
SELECT

statement and again it failed
with error messages.


This made me wonder if I had the correct view
definition, so I then looked at the old view definition
with a GUI tool. It looked the same. I clicked on the Data
tab, and data appeared.

This is becoming really strange, how can a view work,
but the SELECT statement not be executable outside of
the view
.


At this point, I knew it was a Postgres bug, now to figure
out what is going wrong.


When I started looking closely at the views, I noticed
something interesting. When Postgres updates the view
during the
ALTER TABLE
commands, it does so using the
alias syntax.


The following two views show an examples of with and
without using the alias syntax.

Without Alias Syntax

SELECT


schema.
table.field
,


schema2.table.field

FROM
schema.table

LEFT JOIN schema2.table

ON
schema.table.field

=
schema2.table.field;

SELECT


a.field
,


b.field

FROM
schema.table

a

LEFT JOIN
schema2.table b

ON
a.field

=
b.field
;

With Alias Syntax

As you can see from those two views, the Alias syntax
can sometimes make it easier to read the view.


The problem is that Postgres did not update the view
correctly. The following two views show how Postgres
would update the view.

Pre ALTER commands

SELECT


schema.
table.field
,


schema2.table.field

FROM
schema.table

LEFT JOIN schema2.table

ON
schema.table.field

=
schema2.table.field;

SELECT


table.field
,


table.field

FROM
schema.table

table

LEFT JOIN
schema2.table table

ON
table.field

=
table.field
;

Post ALTER commands

The problem with the “Post
ALTER

commands’ version is
that you can’t tell which table is being referenced via the
Alias.


Please
ensure you have read the bug reporting guidelines before reporting a bug. In
particular, please re
-
read the documentation to verify that what you are trying is
possible. If the documentation is not clear, please report that, too; it is a
documentation bug. If a program does something different from what the
documentation says, that is also a bug.


Poor performance is not necessarily a bug. Read the documentation or ask on one of
the mailing lists for help in tuning your applications. Failing to comply to the SQL
standard is not necessarily a bug either, unless compliance for the specific feature is
explicitly claimed.


Before you continue, check on the TODO list and in the FAQ to see if your bug is
already known. If you cannot decode the information on the TODO list, report your
problem so we can clarify the TODO list.


To report a security bug, please send an email to security@postgresql.org. All other
bugs will be forwarded to the pgsql
-
bugs mailing list where they will be publicly
archived.


Make sure you are running the latest available minor release for your major version
before reporting a bug. The current list of supported versions is
9.2.0, 9.1.5, 9.0.9,
8.4.13, 8.3.20.


This bug report form should only be used for reporting bugs and problems with the
PostgreSQL database. Problems with database connectors such as ODBC and JDBC,
graphical administration tools such as pgAdmin or other external projects
should
not be reported here
; please report to those projects directly. For products closely
connected with PostgreSQL, there may be an appropriate mailing list available.


This is a problem, we just can’t upgrade our server
without testing all of our application, etc.


So to be able to report this bug I needed to find a way to
test it on the current version of Postgres 9.0.9 since we
are running 9.0.7. What I came up with was a test script
that I could run on any version of Postgres.

CREATE SCHEMA
schema_a
;

CREATE
SCHEMA
schema_b
;

CREATE
SCHEMA
schema_c
;

CREATE
TABLE
schema_a.table_a

( id
varchar
(11), field1
varchar
(10)
);

CREATE
TABLE
schema_b.table_a

( id
varchar
(11), field1
varchar
(10)
);

CREATE TABLE
schema_b.table_b

( id
varchar
(11) );

INSERT
INTO
schema_a.table_a

VALUES ('test1', 'test2
');

INSERT
INTO
schema_b.table_a

VALUES ('test1', 'test3
');

INSERT INTO
schema_b.table_b

VALUES ('test1');

CREATE OR REPLACE VIEW
public.view_a

AS

SELECT


schema_a.table_a.field1
,


schema_b.table_a.field1
AS field2

FROM
schema_b.table_b


LEFT
JOIN
schema_b.table_a



ON
schema_b.table_b.id = schema_b.table_a.id

LEFT
JOIN
schema_a.table_a



ON
schema_a.table_a.id = schema_b.table_b.id;

The first test of the view is to make sure a
SELECT

on the
view works.


SELECT * FROM
public.view_a
;


Results:

"field1"

"field2"

"test2"

"test3"

CREATE OR REPLACE FUNCTION
schema_c.function_a

()

RETURNS void AS $$

DECLARE

def_row

RECORD;

BEGIN



SELECT
definition INTO
def_row

FROM
pg_catalog.pg_views

WHERE
viewname

=
'
view_a
';

EXECUTE
def_row.definition
;


END;

$$
LANGUAGE
plpgsql
;

The second test of the view makes sure the view definition
works.


SELECT * FROM
schema_c.function_a

();


The real result is no error messages from the function, but
behind the scene we are getting this result.


Results:

"field1"

"field2"

"test2"

"test3"

ALTER TABLE
schema_a.table_a

RENAME TO
table_d
;

ALTER
TABLE
schema_a.table_d

SET SCHEMA
schema_c
;

ALTER
TABLE
schema_b.table_a

RENAME TO
table_e
;

ALTER
TABLE
schema_b.table_e

SET SCHEMA
schema_c
;

ALTER
TABLE
schema_b.table_b

RENAME TO
table_f
;

ALTER
TABLE
schema_b.table_f

SET SCHEMA
schema_c
;

--

Use with Postgres
8.3

--
ALTER TABLE
public.view_a

SET SCHEMA
schema_c
;

--

Use with Postgres 8.4
+

ALTER
VIEW
public.view_a

SET SCHEMA
schema_c
;

Now to test the view again.


SELECT * FROM
schema_c.view_a
;


Results:

"field1"

"field2"

"test2"

"test3"

Testing the view definition again.


SELECT * FROM
schema_c.function_a

();


Results:

ERROR
: invalid reference to FROM
-
clause entry for table "
table_a


LINE
1: ...
hema_c.table_e

table_a

ON (((table_b.id)::text = (schema_c.t
...


HINT
: There is an entry for table "
table_a
", but it cannot be referenced from this part of
the query
.

QUERY
: SELECT schema_c.table_a.field1, schema_c.table_a.field1 AS field2 FROM
((
schema_c.table_f

table_b

LEFT JOIN
schema_c.table_e

table_a

ON
(((table_b.id)::text = (schema_c.table_a.id)::text))) LEFT JOIN
schema_c.table_d

table_a

ON (((schema_c.table_a.id)::text = (table_b.id)::text
)));

CONTEXT
: PL/
pgSQL

function
schema_c.function_a
() line 6 at EXECUTE
statement

CREATE OR REPLACE VIEW
public.view_a

AS

SELECT


schema_a.table_a.field1,


schema_b.table_a.field1 AS field2

FROM
schema_b.table_b


LEFT JOIN
schema_b.table_a



ON schema_b.table_b.id = schema_b.table_a.id

LEFT JOIN
schema_a.table_a



ON schema_a.table_a.id = schema_b.table_b.id;


CREATE VIEW
schema_c.view_a

(


field1,


field2)

AS

SELECT
table_a.field1
,
table_a.field1
AS field2

FROM
schema_c.table_f

table_b


LEFT JOIN
schema_c.table_e

table_a


ON
table_b.id::text = schema_c.table_a.id::text


LEFT JOIN
schema_c.table_d

table_a


ON schema_c.table_a.id::text = table_b.id::text;

To submit a bug:

http://www.postgresql.org/support/submitbug
/

Or on the front page of the Postgres website under
shortcuts, click on “Report a Bug”.


The bug will not show up on the list until it has been
approved. Mine showed up about 6
-
7 hours later.

This appears to be a variant of the what
-
alias
-
to
-
use
-
after
-
a
-
rename problem that we were discussing last winter:
http://
archives.postgresql.org/message
-
id/29791.1327718297@sss.pgh.pa.us
That thread
kinda

died off
after coming to the conclusion that there isn't a simple,
obviously correct solution. The code's current behavior is
definitely wrong in the face of renames, but I don't want to
change it in a way that just moves the failure cases elsewhere
(which is what would happen for any of the proposals made
in the thread
:
-
().


regards
, tom lane

9/18/2012


Tom talks about various ways to fix this bug.


9/20/2012


Tom wrote the first fix for the bug.


...
In the attached I've hacked around this by causing the planner to > assign new aliases to
RTEs that it replaces in this way (see
planagg.c

> and
prepunion.c

diffs). This seems like a bit
of a kluge, but it > doesn't take much code. An alternative that I'm considering is to > have
EXPLAIN make a pre
-
pass over the plan tree to identify which > RTEs will actually be
referenced, and then consider only those RTEs > while assigning aliases. This would be a
great deal more code though, > and code which would require maintenance every time we
add plan node > types etc. So I'm not sure it's really a better answer. Thoughts
?


Attached
is a second draft that does it like that. This adds about 130 lines to
explain.c

compared
to the other way, but on reflection it's probably a better solution compared to trying to kluge
things in the planner. The change in the
select_views

results shows that there's at least one
other case of duplicated RTE names that I'd not covered with the two planner kluges
.


I
think the next question is whether we want to back
-
patch this. Although the problem with
incorrect view dumping is arguably a data integrity issue (
cf

bug #7553), few enough people
have hit it that I'm not sure it's worth taking risks for. I'd feel better about this code once it'd
got through a beta test cycle. Comments
?


regards
, tom lane

Improve
ruleutils.c's

heuristics for dealing with
rangetable

aliases.


The previous scheme had bugs in some corner cases involving tables that
had been
renamed since a view
was made. This could result in dumped views
that failed
to reload or reloaded incorrectly, as seen in bug
#7553 from
Lloyd Albin
, as well as in some pgsql
-
hackers discussion back in January. Also
, its
behavior
for printing EXPLAIN plans was sometimes confusing because
of willingness
to use the same alias for
multiple RTEs (it was
Ashutosh

Bapat's

complaint about that aspect that started the January thread).


To fix, ensure that each RTE in the query has a unique unqualified alias
, by
modifying the alias if
necessary (we add "_" and digits as needed
to create
a non
-
conflicting name). Then we can just print its
variables
with that
alias, avoiding the confusing and bug
-
prone scheme of
sometimes schema
-
qualifying
variable names. In EXPLAIN, it proves to be expedient
to take
the further step of only assigning such
aliases to RTEs that
are actually
referenced in the query, since the planner has a habit
of generating
extra
RTEs with the same alias in situations such
as inheritance
-
tree
expansion.


Although this fixes a bug of very long standing, I'm hesitant to
back
-
patch such
a noticeable behavioral
change. My experiments while creating
a regression
test convinced me that actually incorrect output (as
opposed
to confusing
output) occurs only in very narrow cases, which is backed up
by the
lack of
previous complaints from the field. So we may be better
off living
with it in released branches; and in any
case it'd be smart to
let this
ripen awhile in HEAD before we consider back
-
patching it.

Entered

CREATE OR REPLACE
VIEW

public.view_a

AS

SELECT


schema_a.table_a.field1,


schema_b.table_a.field1 AS field2

FROM
schema_b.table_b

LEFT JOIN
schema_b.table_a


ON schema_b.table_b.id =
schema_b.table_a.id

LEFT JOIN
schema_a.table_a


ON schema_a.table_a.id =
schema_b.table_b.id;

CREATE OR REPLACE
VIEW

public.view_a

(field1, field2) AS

SELECT


table_a_1.field1
,



table_a.field1
AS field2

FROM
schema_b.table_b


LEFT
JOIN
schema_b.table_a



ON
table_b.id::text
=



table_a.id
::
text


LEFT
JOIN
schema_a.table_a

table_a_1



ON
table_a_1.id::text
=



table_b.id
::text;

9.3 DDL

Pre ALTER 9.3 DDL

CREATE OR REPLACE
VIEW

public.view_a

(field1, field2) AS

SELECT


table_a_1.field1,


table_a.field1 AS field2

FROM
schema_b.table_b


LEFT JOIN
schema_b.table_a


ON table_b.id::text =


table_a.id::text


LEFT JOIN
schema_a.table_a

table_a_1


ON table_a_1.id::text =


table_b.id::text;


CREATE OR REPLACE VIEW
schema_c.view_a

(field1, field2) AS

SELECT



table_d.field1,



table_e.field1
AS
field2

FROM
schema_c.table_f



LEFT
JOIN
schema_c.table_e



ON
table_f.id::text
=



table_e.id
::
text



LEFT
JOIN
schema_c.table_d



ON
table_d.id::text
=



table_f.id
::text;

Post ALTER 9.3 DDL

4 days after submission, they had it fixed in the source code
for Postgres 9.3


As part of a presentation on how to compile Postgres source
code on a Raspberry PI, I compiled Postgres 9.3 and was able
to test how the new code will work as shown on the two
preceding slides.


I also really like the way the new code stops using alias’s when
they were no longer needed, thereby cleaning up the DDL.


Many thanks to Tom Lane for the source code changes.