talideon.com

« insert random witticism here »

January 29, 2007 at 9:36AM No Pinnocchio Relations

Here’s something about relational database design I, and I don’t believe I’m alone in this, have a tendancy to forget: Pinnochio relations are bad.

But what’s a Pinnochio relation, you ask? You know when you have some data that you want to store in a table, but doesn’t appear at first glance to either fit into your existing schema or deserve to be treated as a proper relation, so you just serialise it into, say, JSON, XML or something and toss it into a column? That’s a Pinnochio relation: one that’s almost but not quite real, but with a bit of effort can be.

Here’s a simple example from something I’m working on at work, obfuscated, of course. The database as a relation called people, which, as you might guess, holds a list of people. Here’s what it looks like:

CREATE TABLE people (
    id      INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
    name    VARCHAR(64)      NOT NULL,
    email   VARCHAR(64)      NOT NULL,
    address TINYTEXT         NOT NULL,

    PRIMARY KEY  (id),
    INDEX ix_email (email(8))
);

What complicates this picture is that the system takes plug-ins that talk to other systems and these plug-ins need to store data about each person in the people relation. However, this data is somewhat ad-hoc, so what do you do?

There are a few obvious answers. Firstly, you could add a bunch of placeholder columns to the table and allow each of the plug-ins to take a range of them for their own use. But that’s not very smart. Secondly, you could store all the data as a serialised associative array, which is what I did initially as I never expected to have to do anything other than pull it out of the table occasionally. Thirdly, you could create a table with a primary key consisting of the person’s internal id and the name of the key, along with the associated value; essentially an associative array implemented in a table.

Of course, if you want to do anything more than store and retrieve this ad-hoc data, everything’s ok. But what if what you want to do something more interesting with the data, such as, say, doing a lookup on it, or performing some kind of analysis on it? You might be able to do it, but it’ll be hard and error-prone. You’re kind of screwed, really.

Though data like this doesn’t appear to be relational, it really is. One just has to get out of the habit of looking for all the relations, no just ones that map onto objects in your domain model.

In my case, I discovered that some housekeeping code I needed to write needed to be able to do lookups on the value of some one of those bits of ad-hoc data. The solution was to treat the data associated with a particular plug in and person as a relation of its own, which gave me a relation something like this:

CREATE TABLE randomplugin_people (
    id   INTEGER UNSIGNED NOT NULL,
    ref  CHAR(4)          NOT NULL,
    code INTEGER UNSIGNED NOT NULL,

    PRIMARY KEY  (id),
    UNIQUE INDEX ix_ref (ref)
);

Now I can do quick lookups on ref to retrieve the associated person, join it to get all the associated data, or whatever I want, and all because I recognised the data as a proper relation all of its own.

Technorati Search Technorati Search Irish Bloggers

Comments

1 On February 3, 2007 at 1:47, Topper wrote:

What are you on about? Anybody? No? Anybody? Blink Blink

I’m sure it’s all quite logical and brilliant. It’s late.

Post a comment

All form information is optional, but it’s a good idea to fill in your name and email address if you want me to take your comment seriously.

Spammers, don’t bother posting crap down here. The site is set up so that legitimate search engines (Google, for instance) won’t index pages with comments on them. Posting crud here only means you’re wasting my time and patience. Shoo!

Real names, please. Please include!
Won’t be displayed. Please include!
Displayed, if present.