Importing MovableType into WordPress without comments
Warning: Near-fatal geek levels in the post below.
One of the big problems that made me follow Will in abandoning MovableType in favour of WordPress was the constant onslaught of comment spam. The problem with moving, however, is that if I want to keep continuity, I have to import the content of the old blog into the new blog. That's easy enough -- MT very nicely provides export, and WordPress is elegantly set up to import directly from WordPress -- but the problem is that it comes with all the spam. In the case of the now-defunct FreeTrinidad.org, that was 22MB of spam tacked onto the end of a mere 400k of actual content. Now, doubtless there were some real comments in there too, but 99% was spam.
So I wanted to import just the content, not the spam. Unfortunately, there's no nice nice way to do this in either MT or WordPress. So I wrote one, and in case you have this problem yourself, here it is below. Loading a 22MB+ text file into memory and then spitting it back out again tends to cause the average web server to complain, so this solution is deliberately low-memory: it uses an absolute maximum of 30k at a time, usually significantly less, and it spits output directly as it goes.
<?php /** MoveableType export file comment-stripper Instructions: This is a PHP CLI script, i.e. you're supposed to run it like so: php.exe this_script_name.php mt_export_file.txt It will then output everything to the command line -- this lets you see if it's working. A good idea, once it's working, is to instead pipe all the output to a file, e.g. mt_no_comments.txt. Like so: php.exe this_script_name.php mt_export_file.txt > mt_no_comments.txt You can then upload the resulting file into whatever your new blog software is. In our case, WordPress, which handled it perfectly. */ // first argument is file, open it $filename = $_SERVER['argv'][1]; $handle = fopen($filename, "r"); /* algorithm: read every line if not incomment add prevline2 to filtered endif shift prevline1 to prevline2 shift current line to prevline1 store current line if currentline is "COMMENT:" then set incomment = true endif if currentline is "TITLE:" then set incomment = false endif */ // initialize vars $filtered = ""; $prevline2 = ""; $prevline1 = ""; $currentline = ""; $inComment = false; // don't do anything if the file didn't open if ($handle) { // cycle through the whole file while (!feof($handle)) { if ( ! $inComment ) { $filtered .= $prevline2; } $prevline2 = $prevline1; $prevline1 = $currentline; $currentline = fgets($handle, 10000); if ( strpos($currentline, "COMMENT:" ) !== false ) { $inComment = true; } else if ( strpos($currentline, "TITLE:") !== false ) { $inComment = false; } echo $filtered; $filtered = ""; } fclose($handle); } else { echo "Couldn't open $filename !n"; } ?>
Having sorted out this problem also means it will be easier to migrate all the remaining MT blogs (dammit, I keep forgetting there are so many of you...).