January 29, 2007 at 9:36AM No Pinnocchio Relations
Here’s something about relational database design I, and I don’t believe I’m alone in this, have a tendancy to forget: Pinnochio relations are bad.
But what’s a Pinnochio relation, you ask? You know when you have some data that you want to store in a table, but doesn’t appear at first glance to either fit into your existing schema or deserve to be treated as a proper relation, so you just serialise it into, say, JSON, XML or something and toss it into a column? That’s a Pinnochio relation: one that’s almost but not quite real, but with a bit of effort can be.
Here’s a simple example from something I’m working on at work, obfuscated, of
course. The database as a relation called people, which, as you might guess,
holds a list of people. Here’s what it looks like:
CREATE TABLE people (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(64) NOT NULL,
email VARCHAR(64) NOT NULL,
address TINYTEXT NOT NULL,
PRIMARY KEY (id),
INDEX ix_email (email(8))
);
What complicates this picture is that the system takes plug-ins that talk to
other systems and these plug-ins need to store data about each person in the
people relation. However, this data is somewhat ad-hoc, so what do you do?
There are a few obvious answers. Firstly, you could add a bunch of placeholder columns to the table and allow each of the plug-ins to take a range of them for their own use. But that’s not very smart. Secondly, you could store all the data as a serialised associative array, which is what I did initially as I never expected to have to do anything other than pull it out of the table occasionally. Thirdly, you could create a table with a primary key consisting of the person’s internal id and the name of the key, along with the associated value; essentially an associative array implemented in a table.
Of course, if you want to do anything more than store and retrieve this ad-hoc data, everything’s ok. But what if what you want to do something more interesting with the data, such as, say, doing a lookup on it, or performing some kind of analysis on it? You might be able to do it, but it’ll be hard and error-prone. You’re kind of screwed, really.
Though data like this doesn’t appear to be relational, it really is. One just has to get out of the habit of looking for all the relations, no just ones that map onto objects in your domain model.
In my case, I discovered that some housekeeping code I needed to write needed to be able to do lookups on the value of some one of those bits of ad-hoc data. The solution was to treat the data associated with a particular plug in and person as a relation of its own, which gave me a relation something like this:
CREATE TABLE randomplugin_people (
id INTEGER UNSIGNED NOT NULL,
ref CHAR(4) NOT NULL,
code INTEGER UNSIGNED NOT NULL,
PRIMARY KEY (id),
UNIQUE INDEX ix_ref (ref)
);
Now I can do quick lookups on ref to retrieve the associated person, join
it to get all the associated data, or whatever I want, and all because I
recognised the data as a proper relation all of its own.
1 On February 3, 2007 at 1:47, Topper wrote:
What are you on about? Anybody? No? Anybody? Blink Blink
I’m sure it’s all quite logical and brilliant. It’s late.