Saturday, July 3, 2010

pg_sample: extract a sample dataset from a larger PostgreSQL database

pg_sample is a PostgreSQL utility for making smaller versions of large databases.

download pg_sample 0.01

When you have a relatively large database (tables with, say, millions or billions of rows), it can be difficult to generate smaller datasets to work with, especially if foreign keys are heavily used.

That's where this script comes in. It will create smaller instances of each table along with any additional rows needed to satisfy foreign key constraints (circular dependencies are supported).

The script's operation closely resembles that of pg_dump. For example, assuming we have a large database named largedb, a smaller version could be produced with:

createdb smalldb
pg_sample largedb | psql smalldb
The smalldb would then contain a subset of largedb's data.

Here are the command-line options (many of which mirror pg_dump):

-a
--data-only
Output only the data, not the schema (data definitions).

-E *encoding*
--encoding=*encoding*
Use the specified character set encoding. If not specified, uses the
environment variable PGCLIENTENCODING, if defined; otherwise, uses
the encoding of the database.

-f *file*
--file=*file*
Send output to the specified file. If omitted, standard output is
used.

--force
Drop the sample schema if it exists.

--keep
Don't delete the sample schema when the script finishes.

--limit=*number*
The maximum number of rows to initially copy from each table
(defaults to 100). Note that sample tables may end up with
significantly more rows in order to satisfy foreign key constraints.

--random
Randomize the rows initially selected from each table. May
significantly increase the running time of the script.

--schema=*name*
The schema name to use for the sample database (defaults to
_pg_sample).

--trace
Turn on Perl DBI tracing. See the DBI module documentation for
details.

--verbose
Output status information to standard error.

The following options control the database connection parameters.

-h *host*
--host=*host*
The host name to connect to. Defaults to the PGHOST environment
variable if not specified.

-p *port*
--port=*port*
The database port to connect to. Defaults to the PGPORT environment
variable, if set; otherwise, the default port is used.

-U *username*
--username=*username*
User name to connect as.

-W *password*
-password=*password*
Password to connect with.
See also: pg_sample Github source repository

Sunday, May 9, 2010

ip2host 1.11 Release

A new release of ip2host is available. It's a small maintenance release incorporating changes from the Debian package maintainer. The source repository has also been moved to Github.

http://github.com/mla/ip2host

DESCRIPTION

    Resolves IPs to hostnames in web server logs. This is a faster, drop-in
    replacement for the logresolve utility distributed with the Apache web
    server.

CHANGELOG

ip2host 1.11

  * Silence warnings thanks to Andrew McNaughton and Gunnar Wolf
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=563129

Saturday, May 8, 2010

Installing Adobe AIR on 64-bit Ubuntu 9.10 Linux

Adobe AIR isn't currently available for 64-bit Linux, but they provide instructions on running the 32-bit version.

I've converted the Ubuntu instructions into this shell script:
http://sites.google.com/site/mlawire/installing-adobe-air-1-5-on-64-bit-ubuntu-linux/install-adobe-air-ubuntu-64bit.sh

From a terminal:

wget http://sites.google.com/site/mlawire/installing-adobe-air-1-5-on-64-bit-ubuntu-linux/install-adobe-air-ubuntu-64bit.sh
chmod a+rx install-adobe-air-ubuntu-64bit.sh
sudo ./install-adobe-air-ubuntu-64bit.sh
This worked for me on Ubuntu 9.10. Please let me know if you try it on other Ubuntu releases or otherwise have trouble with it.

Keywords: Installing Adobe AIR 64-bit Linux Ubuntu