talideon.com

Blackout Ireland

March 30, 2008 at 7:19PM “Sombody broke USE_GNOME!”, or Why .0 versions are a bitch

I’ve had my evenings for the last week ruined because of two little lines of code.

Yep, two lines.

Not my lines though, but I’d been wisely assuming that this was all my fault.

Some background: when I was up home in Sligo at easter, I updated my ports tree and upgraded my laptop to FreeBSD 7.0. Asides from needing to switch to the GENERIC kernel from my tweaked one that allowed me to mount large FAT32 formatted external HDDs before I could upgrade, it went off without a hitch. Before I left the house, I had portupgrade fetch down the distfiles for any ports that needed upgrading and figured that I could continue that part of the upgrade down in Carlow.

All the ports I’d installed while running FreeBSD 6.2 ran just find under FreeBSD 7.0, but I thought it was worth rebuilding everything so that next time I did an update, I wouldn’t have old 6.0 libraries cluttering up the system.

That was my mistake.

As I was later to discover, there was a problem lurking deep within the ports infrastructure, specifically the Mk/bsd.gnome.mk file. Here’s the fix, for those who can’t wait for the rest of the story:

--- bsd.gnome.mk.orig	2008-03-29 18:21:06.000000000 +0000
+++ bsd.gnome.mk	2008-03-29 18:21:36.000000000 +0000
@@ -712,6 +712,8 @@
 FETCH_DEPENDS+=	${${component}_FETCH_DEPENDS}
 EXTRACT_DEPENDS+= ${${component}_EXTRACT_DEPENDS}
 BUILD_DEPENDS+=	${${component}_BUILD_DEPENDS}
+LIB_DEPENDS+=	${${component}_LIB_DEPENDS}
+RUN_DEPENDS+=	${${component}_RUN_DEPENDS}
 #######################################################

 .if !defined(WITHOUT_HACK)

Those in the know should be able to figure out what happened from that. That first extra line is the difference between a 17MB INDEX-7 file and a 24MB INDEX-7 file, and the second one adds on another 1MB again. That’s quite a bit of missing data. It’s critical that after applying that patch, you do the following:

$ cd /usr/ports
$ sudo rm -rf INDEX-*
$ sudo make index
$ sudo portsdb -u
$ sudo pkgdb -aF

Otherwise the port tools won’t have the right dependencies, and will start doing stupid things like building the same port twice, regardless of whether it’s installed or not.

I ran a full update with portupgrade and discovered that when I’d ran previously, it’d only done a shallow distfile fetch, only fetching the distfiles of whichever ports had been updated, which didn’t catch cases where those ports might now be depending on other ports now too. I was annoyed, but learned my lesson not to depend on portupgrade anymore to fetch updates and to stick with portmaster for that from now on. The house I live in in Carlow has no internet connectivity--not by my choice--so I collected a list of the missing distfiles and pulled them down in work.

My second attempt at rebuilding my ports failed, but in quite subtle ways. The first inkling I got that something was wrong was that any GTK apps written in Python would hang when loading the GTK libraries. When I rebooted, dbus was screwed, and nautilus would just hang in the background, completely undead.

In retrospect, the thing I did that aggravated the situation enough to discover what the true problem was was that I cleared out the contents of /usr/local/lib/compat/pkg. I was rebuilding everything and not keeping any shared libraries, so I figured it’d be safe. Yes, I know you’re not supposed to do this, but it’s my machine and if I break it, it hurts nobody but me, and it’s not as if I didn’t understand the possible consequences.

I had no idea what was wrong and had no way to diagnose it at the time, so I decided to wipe every single port and clear any leftover junk from /usr/local that I didn’t want to keep. There was a fair bit of it, more than there ought to have been.

I decided to do a minimal install. After installing the basics, I went to reinstalling Vim. My copy is set up to depend on GTK+ 2, so I’d expected it to pull in and build it. However, it wasn’t and it and dbug-glib exploded in my face saying that it couldn’t find GLib. I noted what’d happened and configured it to build without any X11 support whatsoever, and it built fine.

Seeing as GNOME/GTK/GLib seemed to be at the root of my difficulties, I figured I’d only install anything that could have its dependencies on them removed, or which didn’t depend on them in the first place. That meant no HAL or DBus, or anything that depended on them. I configured my X server to get rid of its HAL dependency and installed the only usable window manager I had that didn’t depend on them to some degree: dwm.

Now that I’d something usable and I’d set up xterm to use a decent dark colour scheme, I set to work figuring out why everything was broken. My suspicions fell upon the USE_GNOME knob, which seemed to be the focus for all the breakage. Once I’d figured out enough of the workings behind the ports system, it lead me to Mk/bsd.gnome.mk, the make file that USE_GNOME controls. I figured out how the values in USE_GNOME relate to various dependencies, and traced through the building of a typical small port from scratch, dbus-glib. That’d when I realised that it was ignoring any library dependencies, so I patched it so that it’d collect them properly. That’s the first patch line you’ll see. That fixed dbus-glib, so I though I’d found the problem. I was stoked, so I decided to try to install something more directly useful: Meld.

I need to compare and patch files and directories of files quite a bit, so Meld is invaluable. It’s GNOME’s killer app as far as I’m concerned and about the only reason I’ve the gnome2-lite metaport installed at all. Tomboy comes a close second, but that’s mainly because I wrote a similar tool called Orpheus with wxWidgets some years back for a project, so once I’d got past the fact it depended on Mono, I grudgingly allowed it on my desktop.

The build went fine until it came to installing gnome-menus, where it crapped out complaining that /usr/local/etc/mtree/BSD.gnome.dist couldn’t be found. I scratched my head, sighed, and opened the makefile up again.

That’s where the second line comes from. It wasn’t pulling in runtime dependencies, and in this case it meant the gnomehier member of USE_GNOME as being skipped.

As I write this, everything appears to be building ok. There’s a couple of ports, specifically gvfs and the FuseFS port it depends on, I need to fetch the distfiles for which were skipped when I fetched the previously because they weren’t showing up, but that’s no biggie.

I’m sure I’m far from the only person who was affected by this. It’s amazing that two innocuous lines would have such a major effect on the system. My guess as to what happened is that somebody was editing the file with Vim, hit the d key close to the code, and then hit the up or down cursor key, inadvertently deleting those two lines. It’s happened to me in the past, so it’s plausible.

I found out the problem had been patched in the meantime, so there’s no need for me to go submitting a bug report myself. Still, I thought the experience was worth documenting, and I definitely learned a lot about the inner working of the ports system in the process!