Spring is coming. It's not in the air, yet. But it's promising to be. Today there was an absolutely spectacular sunset over the UK, which made me happy.
UrbanDictionary's word of the day today was anablog, which I think should be modified to mean the stuff you write down with pen and paper when you're on holiday, so you can blog it properly later. I have a really big anablog I wrote on the plane home from Tobago that I have yet to post.
Warning: Near-fatal geek levels in the post below.
One of the big problems that made me follow Will in abandoning MovableType in favour of WordPress was the constant onslaught of comment spam. The problem with moving, however, is that if I want to keep continuity, I have to import the content of the old blog into the new blog. That's easy enough -- MT very nicely provides export, and WordPress is elegantly set up to import directly from WordPress -- but the problem is that it comes with all the spam. In the case of the now-defunct FreeTrinidad.org, that was 22MB of spam tacked onto the end of a mere 400k of actual content. Now, doubtless there were some real comments in there too, but 99% was spam.
So I wanted to import just the content, not the spam. Unfortunately, there's no nice nice way to do this in either MT or WordPress. So I wrote one, and in case you have this problem yourself, here it is below. Loading a 22MB+ text file into memory and then spitting it back out again tends to cause the average web server to complain, so this solution is deliberately low-memory: it uses an absolute maximum of 30k at a time, usually significantly less, and it spits output directly as it goes.