talideon.com

Blackout Ireland

Entries for February 2008

February 1, 2008 at 5:50PM Things I hate: keyboard bashers

These bastards must die. They get my fucking nut!

Sure, everybody has bad days and everybody gets angry, but be considerate to your co-workers and don’t fucking beat the keyboard to a pulp! Consider that these people have stuff that they have to do at work and that it’s hard to concentrate when somebody close by is pummelling away.

If you find yourself bashing the keyboard or if somebody mentions to you that you’ve bashing it, consider doing something more productive like going out for a walk, getting some tea, buying a chiclet keyboard online, heading to a gym, or possibly taking your pent-up anger out on the object of your extreme ire.

But don’t force your anger on those around you. Put yourself in their shoes and imagine what it’s like having the sonic equivalent of a jackhammer belting away in your ear for nine hours. That’s not cool, so stop it.

Thank you for reading this public service announcement.

Update: The expensive noise-blocking headphones I got at the weekend might help. They’ve been working so far.

February 4, 2008 at 1:28PM Fixing a FreeBSD annoyance: shell editing keybindings

Unlike all the other systems I use, FreeBSD is lacking in one tiny little area: default shell keybindings. Specifically, there’s no keybindings associated with the arrow keys when CTRL is held down.

I’m used to being able to move back a word with CTRL+LeftArrow is pressed, being able to move forward a word with CTRL+RightArrow, being able to move to the start of the line with CTRL+UpArrow, and being able to move to the start of the line with CTRL+DownArrow, but none of the shells are configured with these bindings by default. There are bindings for the particular functions though, just not these particular ones.

This is a bit of an annoyance, but it’s only recently that I’ve became sufficiently annoyed to fix it. Here’s my zsh bindings, which also work with tcsh:

bindkey '\e[1;5D' backward-word
bindkey '\e[1;5C' forward-word
bindkey '\e[1;5A' beginning-of-line
bindkey '\e[1;5B' end-of-line

Here’s the bash bindings:

bind '"\e[1;5D": backward-word'
bind '"\e[1;5C": forward-word'
bind '"\e[1;5A": beginning-of-line'
bind '"\e[1;5B": end-of-line'

And finally, the FreeBSD sh bindings:

bind "^[[1;5D" ed-prev-word
bind "^[[1;5C" ed-next-word
bind "^[[1;5A" ed-move-to-beg
bind "^[[1;5B" ed-move-to-end

There, fixed.

February 6, 2008 at 5:11PM I’m in yr datacentur...

I know I shouldn’t but I can’t help myself...

I'm in yr datacentur, killin' yr chillin'

It may or may not concern this...

Disclosure: I’m senior developer at Blacknight Solutions, one of their competitors, but this post is simply meant in good-humoured jest. We all have our bad days...

Oh, and did I mention that this has absolutely nothing whatsoever to do with my employers? In fact, Michele even gave me a slap on the wrist for it, so there!

February 6, 2008 at 9:47PM Channel One: These Roads

[Band Website; MySpace]

February 6, 2008 at 10:19PM Channel One: Accelerate; Brake

Top tune, top video:

[Band Website; MySpace]

February 8, 2008 at 2:55PM Things I hate: PHP errors

While I’m more comfortable writing code in it than some other languages, there’s one thing about PHP that makes me want to beat the developers around the head: how it deals with errors, but specifically how it doesn’t give 500 Internal Server Error status codes when an error occurs and it has a chance to do so.

Take this simple piece of code:

<?php bug();

This should guarantee us an error because there’s no bug() function defined. Let’s save that to a file:

Saving the sample code to a file

Now, let’s view it in Firefox:

Viewing it in Firefox

A fatal error, just as we expected. You’ll notice that the error isn’t the usual one. That’s because I use the excellent xdebug, though this has no bearing on PHP’s behaviour in this regard.

Let’s do a HEAD request:

Results of the HEAD request: Whiskey Tango Foxtrot!

200 OK?

A fatal error occurs that PHP could trap before any output gets sent and I get a 200 OK?! That’s just wrong, and here’s why:

Let’s say you’re writing a machine-readable service of some kind. Clients of this service rely on the server they’re talking to giving back reliable status code to tell if their request has succeeded.

Now, say said service has a bug in it such as the one demonstrated above. How exactly are clients meant to tell if their request has succeeded or not?

PHP is a pain for writing machine-readable services in. php -l won’t catch these kinds of bugs and Providence knows it’s easy enough for these kinds of things to get through in the form of spelling errors. It ought to be at least possible to get PHP to send 500 status codes in the case of error even if that’s not the default setting. Fail fast is a good thing in a machine-readable system.

February 11, 2008 at 3:11PM From my ~/bin: find-duplicates

This is is starting to become a bit of a series! I needed a small, sharp tool for finding what duplicate files I had sitting on my external harddrive. I’ve lot of stuff which, over time, has got inadvertently duplicated, or downloaded twice, such as photos, software archives, sound files, papers and articles, &c. It needed to be fast and small, capable of having several directories searched at once (e.g., my home directory and some directory on the external HDD), and grepable. Here’s the result:

#!/usr/bin/env python
#
# find-duplicates
# by Keith Gaughan <http://talideon.com/>
#
# Finds an lists any duplicate files in the given directories.
#
# Copyright (c) Keith Gaughan, 2008.
# All Rights Reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
#  1. Redistributions of source code must retain the above copyright
#     notice, this list of conditions and the following disclaimer.
#
#  2. Redistributions in binary form must reproduce the above copyright
#     notice, this list of conditions and the following disclaimer in the
#     documentation and/or other materials provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS "AS IS" AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY
# DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
# THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# This license is subject to the laws and courts of the Republic of Ireland.
#

from __future__ import with_statement
import sys
import os
import hashlib
import getopt
import filecmp


USAGE = "Usage: %s [-h] [-m<crc|md5>] <dir>*"


class crc:
    """
    Wraps up zlib.crc32 to make it suitable for use as a faster but less
    accurate alternative to the hashlib.* classes.
    """
    def __init__(self, initial=None):
        self.crc = 0
        if initial is not None:
            self.update(initial)
    def update(self, block):
        import zlib
        self.crc = zlib.crc32(block, self.crc)
    def hexdigest(self):
        return "%X" % self.crc
    def digest(self):
        # Er...
        return self.crc


def all_files(*tops):
    """Lists all files in the given directories."""
    for top in tops:
        for dirname, _, filenames in os.walk(top):
            for f in filenames:
                path = os.path.join(dirname, f)
                if os.path.isfile(path):
                    yield path


def digest(file, method=hashlib.md5):
    with open(file) as f:
        h = method(f.read()).digest()
    return h


def true_duplicates(files):
    """
    Compare the given files, breaking them down into groups with identical
    content.
    """
    while len(files) > 1:
        next_set = []
        this_set = []
        master = files[0]
        this_set.append(master)
        for other in files[1:]:
            if filecmp.cmp(master, other, False):
                this_set.append(other)
            else:
                next_set.append(other)
        if len(this_set) > 1:
            yield this_set
        files = next_set


def group_by(groups, grouper, min_size=1):
    """Breaks each of the groups into smaller subgroups."""
    for group in groups:
        subgroups = {}
        for item in group:
            g = grouper(item)
            if not subgroups.has_key(g):
                subgroups[g] = []
            subgroups[g].append(item)
        for g in subgroups.itervalues():
            if len(g) >= min_size:
                yield g


def usage(message=None):
    global USAGE
    fh = sys.stdout
    exit_code = 0
    if message:
        fh = sys.stderr
        exit_code = 2
        print >>fh, str(message)
    name = os.path.basename(sys.argv[0])
    print >>fh, USAGE % (name,)
    sys.exit(exit_code)


def main():
    try:
        opts, paths = getopt.getopt(sys.argv[1:], "hm:")
    except getopt.GetoptError, err:
        usage(err)
    method = crc
    for o, a in opts:
        if o == "-m":
            if a == "crc":
                method = crc
            elif a == "md5":
                method = hashlib.md5
            else:
                usage("Unknown grouping method: %s" % (a,))
        elif o == "-h":
            usage()
        else:
            usage("Unknown option: %s%s" % (o, a))

    if len(paths) == 0:
        paths = ["."]

    first = True
    groups = [all_files(*paths)]
    for grouper in [os.path.getsize, lambda file: digest(file, method)]:
        groups = group_by(groups, grouper, 2)
    for group in groups:
        for files in sorted(true_duplicates(group)):
            if first:
                first = False
            else:
                print
            for file in files:
                print file


if __name__ == "__main__":
    main()

It’s pretty simple, really. It walks the directories listed on the command line, sorting the files it finds into groups of files likely to be similar. The default criterion is to group them based on file size, which is quick and usually a good indicator that the files are the same. If it’s not, such as if you’re dealing with a bunch of RAW images or TARGA files that are all the same size, other criteria can be used, such as the result of a CRC (Adler-32, specifically) ran on the files, or their MD5 hash. CRCs might be useless for hashing, but they’re much quicker than a proper hash, and give a reasonable indication of whether the files are same. After that, each file in the group is explicitly compared for equality to ensure that they really are duplicates, and broken into subgroups. This ensures that you won’t accidentally delete anything that might appear to be a duplicate, but really isn’t.

When it’s finds identical files, they’re listed together, one per line, in groups separated by empty lines.

Update (Feb 12th): Got rid of some stupid inefficiencies. I’d forgotten that the read() method on filehandles will read everything if no argument’s provided. This speeds it up somewhat when using the CRC or MD5 methods. For the size-based method, I think I’m going to introduce some extra grouping based on one of the other two methods to avoid needless file comparisons. Oh, and whittle() returns a generator now rather than a list.

Update (Feb 14th): Threat carried out! whittle() has been replaced by a more general method called group_by() which is similar but accepts an iterable of iterables instead of an iterable, and rather than returning a generator, it is a generator. Also, after the initial grouping by size, it groups by CRC, though this can be changed to use MD5 with a switch. Now the initial cheap grouping by size is always done, and the more expensive grouping methods (by hash/checksum, then direct comparison) are done after. This is much faster.

February 18, 2008 at 12:37AM Why do I bother?

I’d big plans to get to come in this weekend to work to get some crap that had been slowing my progress out of the way once and for all. What happened? I barely got out of bed on Saturday, and on Sunday I eventually got off my ass and went in at 8pm. Yes, 8pm.

Then when I’m in there, do I actually do anything? Aside from a few bugfixes and some almost pointless refactorings, I got absolutely nowhere. It’s depressing. Part of the point of coming in this weekend was to avoid distractions and just work. Now I’m wishing I came in early on Saturday, bringing in a copy of the BZR repositories I have for doing work on my laptop, merged in my changes, and worked at home for the weekend. Coming into the office definitely hasn’t done me any good.

Never mind me, I’m just annoyed with myself. I’m off home...

Feb 23rd: And on the topic of thing that piss me off, here’s what I think of the stupid data-truncating MySQL server that drives this site:

I18N FAIL

The stupid thing keeps truncating stuff with non-ASCII characters. It sucks ass and in spite of my best efforts, it fails utterly.

February 25, 2008 at 11:48PM Suburban Kids with Biblical Names: Loop duplicate my heart

How brilliantly geeky!

February 26, 2008 at 10:24PM Ciaran Byrne: Ode to Able Sail

Very Boards of Canada, and very, very good.

[MySpace]

February 27, 2008 at 1:37PM Separated at birth?

On the back of Damien’s recent post on Ogra FF members, I just couldn’t help myself:

Separated at birth? Brian Cowen and Fred Flintstone

February 27, 2008 at 6:45PM K-OS: Sunday Morning

Sure, it’s pop, but it’s great pop!

February 28, 2008 at 9:35PM LCD Soundsystem: Someone Great

Even if you don’t listen to the lyrics, you’ve just got to love that simple, spare hook. Something Great is, without a doubt, a masterpiece.

Lyrics

The ones not in the radio/video mix but in the full album version are [bracketed].

I wish that we could talk about it
but there, that’s the problem. With someone new I couldn’t start it;
too late for beginnings.
The little things that made me nervous
are gone in a moment.
I miss the way that we used to argue,
locked in the basement.

[I wake up and the phone is ringing.
Surprised, as it’s early
and that should be a perfect warning
that something’s a problem.
To tell the truth I saw it coming:
the way you were breathing.
But nothing can prepare you for it;
the voice on the other end...]

The worst is all the lovely weather:
I’m stunned it’s not raining.
The coffee isn’t even bitter
because, what’s the difference?
There’s all the work than needs to be done,
it’s late for revision.
There’s all the time and all the planning
and songs to be finished.

And it keeps coming, (x3)
’til the day stops.
And it keeps coming, (x3)
’til the day stops.

And it keeps coming, (x7) ’til the day stops.

I wish that we could talk about it
but there, that’s the problem.
With someone new I couldn’t start it;
too late for beginnings.
You’re smaller than my wife imagined;
surprised you were human.
There shouldn’t be this radio silence,
but what are the options?

When someone great is gone! (x8)

[We’re safe for the moment,
saved, for the moment.]

February 28, 2008 at 11:21PM Siobhan Donaghy: Ghosts

Proof that, yes! You too can have a credible music career after being in a girl group. Very Cocteau Twins.

[Website; MySpace]

I know I’m posting a lot of music videos recently, but the topic of this blog is me, yes, me! And the things I like, of course. There’s no such thing as self-centred around these parts unless it’s about somebody else. [smile]