Importing MovableType into WordPress without comments

Warning: Near-fatal geek levels in the post below.

One of the big problems that made me follow Will in abandoning MovableType in favour of WordPress was the constant onslaught of comment spam. The problem with moving, however, is that if I want to keep continuity, I have to import the content of the old blog into the new blog. That's easy enough -- MT very nicely provides export, and WordPress is elegantly set up to import directly from WordPress -- but the problem is that it comes with all the spam. In the case of the now-defunct FreeTrinidad.org, that was 22MB of spam tacked onto the end of a mere 400k of actual content. Now, doubtless there were some real comments in there too, but 99% was spam.

So I wanted to import just the content, not the spam. Unfortunately, there's no nice nice way to do this in either MT or WordPress. So I wrote one, and in case you have this problem yourself, here it is below. Loading a 22MB+ text file into memory and then spitting it back out again tends to cause the average web server to complain, so this solution is deliberately low-memory: it uses an absolute maximum of 30k at a time, usually significantly less, and it spits output directly as it goes.

<?php
/** MoveableType export file comment-stripper
  Instructions:
  This is a PHP CLI script, i.e. you're supposed to run it like so:
     php.exe this_script_name.php mt_export_file.txt
  It will then output everything to the command line -- this lets you 
  see if it's working. A good idea, once it's working, is to instead
  pipe all the output to a file, e.g. mt_no_comments.txt. Like so:
     php.exe this_script_name.php mt_export_file.txt > mt_no_comments.txt
  You can then upload the resulting file into whatever your 
  new blog software is. In our case, WordPress, which handled it perfectly.
*/

// first argument is file, open it
$filename = $_SERVER['argv'][1];
$handle = fopen($filename, "r");

/*
algorithm:
read every line
  if not incomment
    add prevline2 to filtered
  endif
  shift prevline1 to prevline2
  shift current line to prevline1
  store current line
  if currentline is "COMMENT:" then
     set incomment = true
  endif
  if currentline is "TITLE:" then
     set incomment = false
  endif
*/

// initialize vars
$filtered = "";
$prevline2 = "";
$prevline1 = "";
$currentline = "";
$inComment = false;

// don't do anything if the file didn't open
if ($handle)
{
	// cycle through the whole file
	while (!feof($handle))
	{
	   
		if ( ! $inComment )
		{
			$filtered .= $prevline2;
		}

		$prevline2 = $prevline1;
		$prevline1 = $currentline;
		$currentline = fgets($handle, 10000);
		
		if ( strpos($currentline, "COMMENT:" ) !== false )
		{
			$inComment = true;	
			
		} else if ( strpos($currentline, "TITLE:") !== false ) 
		{
			$inComment = false;			
		}
		
		echo $filtered;
		$filtered = "";
		
	}
	fclose($handle);
	
} else {

	echo "Couldn't open $filename !n";
		
}

?> 

Having sorted out this problem also means it will be easier to migrate all the remaining MT blogs (dammit, I keep forgetting there are so many of you...).