The WordPress Polylang plugin is a nice system to make your WordPress blog multilingual. However, if you had a different solution in use before, and have a lot of blog posts, migrating to Polylang is an effort that cannot reasonably be done manually. So I developed a little Ruby script polyglot2polylang.rb
for that. It is meant to migrate your blog content from the (no longer maintained) Polyglot plugin. However, you can easily adapt it to work for all multilingual plugins that store different language versions inside one WordPress page / post / comment etc. by using special tags, for example <lang_en>…</lang_en>.
Installation
- Download the Ruby scripts and Gemfile from the polylang-migrate repository at Github, and save them all into one directory. Or just
git clone
it, of course. - In that directory, execute
bundle install
to install necessary gems (basically just nokogiri).
Usage
- Export your WordPress blog content as WordPress WXR file using the "Tools -> Export" menu item.
- Syntax-check your exported WXR file. (Else, unexpected behavior can occur. For example, an unexpected closing tag with no corresponding opening tag will cause nokogiri to consider the current item as the last one, without any hint or warning. Which means that not all posts of the blog will be processed.) To check the syntax, for example use
xmllint --noout
, which will print parser errors on stdout and print nothing if all is right. - Run the
polyglot2polylang.rb
script on the exported WXR file (see the instructions in the script). - Make a backup of your complete WordPress database.
- Make a backup of all your media files: cd
httpdocs/wp-content/; cp -a uploads/ uploads.orig/;
. This is because when deleting the last media library entry that refers to a specific file, that file will be deleted, too. And we will have to delete all media library entries later! - Install the WordPress Polylang plugin.
- Delete all posts, pages, comments, categories, tags and media library entries in your WordPress blog, using the WordPress backend interface. You can use the Bulk Delete plugin to speed up that task a bit.
- Make sure the files for the media library entries are available at the URLs mentioned in their <item> tags in the WXR file. Adapt the
polyglot2polylang.rb
script to modify these URLs accordingly, if necessary. - Import the modified WordPress WXR file. When asked, check "Download and import file attachments."
The last step wil fail for WXR files with many attachments to download and import (incl. 500 Internal Server Error), except if your web server is configured to allow very long script execution times. You can either configure it to extend these execution times [instructions, of which step 1 is irrelevant now] or alternatively use this, rather clumsy, workaround (in analogy to what other people found):
- Use
wxr-separate-attachments.rb
(also in the to split your WXR file into attachments and content (posts and pages). - Create a post (media file, page or post) with an ID that is higher than every ID that will be used by your imports. This can be done by creating some media file, then modifying its field "ID" in table wp_posts, via phpMyAdmin. This step is needed because WordPress will create a media file for every WXR file you upload for import, and if its ID is one that should be available to an imported post, the imported post's ID will get shifted instead, without any error message, and the IDs of all following imported posts likewise, producing total cofusion with respect to ID references in the WXR file.
- Import the WXR file with attachments into your WordPress blog as many times in a row until no error happens. After every import, some more media files will be successfully downloaded and imported. Of course, check "Download and import file attachments." every time.
- Import the WXR file with posts and pages.
- You will now have all content imported, but media library entries will appear as not attached to posts and pages [issue report]. This also happens when importing attachments after posts and pages. The solution is to create the attaching directly in the database, using phpMyAdmin to execute the SQL file <outfile>.attach.sql that was also created when you did run
polyglot2polylang.rb
.
Limitations
- Workaround for title translations getting lost. As documented in the script, the WXR export will not contain Polyglot markup tags in the post and page titles, so these translations get lost during this process. You can however get the original titles as an export from your database (for example by exporting a one-column result set to CSV in phpMyAdmin) and then use an adapted version of
depolyglot.rb
to create SQL statements that will convert your artificially-unique titles back to the correct translations that did get lost. - Changing language-specific slugs. After this process, you will have the original post slugs for one language version and other language versions with an appendix ("-italiano" in the unmodified scripts). The
polyglot2polylang.rb
script generates another SQL file <outfile>.attach.sql that you can edit and execute to adapt these more to your liking, by doing proper translations. The original slugs are better not changed this way to keep URL compatibility with existing links, but you can edit them in your WordPress backend – WordPress then creates a 302 forwarder for the original one, and this also gets saved when backing up to a WXR file. - Better interface with the database directly. The whole process of transition is quite a complex, and further complicated by title translations getting lost in an WXR export, and timeouts of the WXR import. For that reason, if I had to re-do this task, I would write a script that directly operates on the WordPress SQL database. The database has a clear structure, and I'd rather like to deal with that than all these hacks. If there's no Ruby installed on the server, the script can also run locally and access the database with a remote connection. And, when enabled with some option, it could even ask the user for interactibely correcting slug names etc..
Leave a Reply