Using diff/patch from within PostgreSQL
Sometimes you might want to store diffs of two strings in the database, so you save some space and actually see what has changed - and apply those patches to a string to get from one version to the other.
I'll be using PL/PerlU, i.e. untrusted Perl as procedural language in PostgreSQL, so you'll have to check that this is available in your database. For this to be the case, PostgreSQL has to be built with Perl-support and you'll have to enable PL/PerlU for your database using createlang:
createlang -U postgres -d yourdb plperlu
If you compile PostgreSQL from source, make sure you configure --with-perl. If you use a pre-packaged version, you will very likely need to install a separate package - on Debian Etch, this would be postgresql-plperl-8.1, on Debian Lenny postgresql-plperl-8.3 and on RedHat type systems it would be the postgresql-plperl rpm. This would need to be done before the createlang of course. Thanks to Robert Treat for the added info.
Having "Untrusted" Perl in the database may require some additional security measures on your side if you have got several users on your database with different privilege levels. I won't go into more detail here as this didn't apply in my case. On the upside, PL/PerlU will allow you to use the strict pragma and, what's even more important, access external libraries. In this case, you need to install Text::Diff and Text::Patch from CPAN.
Now on to the functions:
diff-function
CREATE or REPLACE FUNCTION "public"."perldiff"(text,text)RETURNS text AS $BODY$ use strict; use Text::Diff; my $src = $_[0] . "\n"; my $dst = $_[1] . "\n"; my $diff = diff (\$src, \$dst , { STYLE => 'Unified', CONTEXT => 0 }); return $diff; $BODY$ LANGUAGE 'plperlu' IMMUTABLE;
patch-function
CREATE or REPLACE FUNCTION "public"."perlpatch"( text, text) RETURNS text AS $BODY$ use strict; use Text::Patch; my $src = $_[0]; my $diff = $_[1] ; my $result = patch( $src, $diff, STYLE => 'Unified'); return $result; $BODY$ LANGUAGE 'plperlu' IMMUTABLE;
Have fun!

June 16th, 2009 - 02:40
If you’re running a pre-packaged version of postgres, it’s likely there is a separate package for pl/perl support; for example, on red hat type systems, you would install the postgresql-plperl rpm before issuing the create language command above. HTH.
June 16th, 2009 - 18:26
Thank you for adding this info – I am so used to compiling from source that I just didn’t think about the majority of users who probably use a pre-packaged version. I’ll add that to the article.
July 6th, 2009 - 22:07
Hi! I like your article and I would like very much to read some more information on this issue. Will you post some more?
July 7th, 2009 - 09:11
@KonstantinMiller: We’re always open for suggestions, so if you want me to go into more depth with this, ask away! We’re currently using this for a trigger based versioning system in our CMS; we’ve got articles with several thousand characters and I wanted to avoid having to store a full version of the whole article if somebody just corrected some spelling mistake. This just stores the differences for the changed lines. Of course there may be other uses for this, for example if you’re having some kind of source control in your database, let’s say for website templating. This mechanism will provide you with space efficient storage, easy backtracking through changes and a basis for auditing who changed what and when.