How to migrate to WordPress Polylang?

The WordPress Polylang plugin is a nice system to make your WordPress blog multilingual. However, if you had a different solution in use before, and have a lot of blog posts, migrating to Polylang is an effort that cannot reasonably be done manually. So I developed a little Ruby script polyglot2polylang.rb for that. It is meant to migrate your blog content from the (no longer maintained) Polyglot plugin. However, you can easily adapt it to work for all multilingual plugins that store different language versions inside one WordPress page / post / comment etc. by using special tags, for example <lang_en>…</lang_en>.

Installation

  1. Download the Ruby scripts and Gemfile from the polylang-migrate repository at Github, and save them all into one directory. Or just git clone it, of course.
  2. In that directory, execute bundle install to install necessary gems (basically just nokogiri).

Usage

  1. Export your WordPress blog content as WordPress WXR file using the "Tools -> Export" menu item.
  2. Syntax-check your exported WXR file. (Else, unexpected behavior can occur. For example, an unexpected closing tag with no corresponding opening tag will cause nokogiri to consider the current item as the last one, without any hint or warning. Which means that not all posts of the blog will be processed.) To check the syntax, for example use xmllint --noout, which will print parser errors on stdout and print nothing if all is right.
  3. Run the polyglot2polylang.rb script on the exported WXR file (see the instructions in the script).
  4. Make a backup of your complete WordPress database.
  5. Make a backup of all your media files: cd httpdocs/wp-content/; cp -a uploads/ uploads.orig/;. This is because when deleting the last media library entry that refers to a specific file, that file will be deleted, too. And we will have to delete all media library entries later!
  6. Install the WordPress Polylang plugin.
  7. Delete all posts, pages, comments, categories, tags and media library entries in your WordPress blog, using the WordPress backend interface. You can use the Bulk Delete plugin to speed up that task a bit.
  8. Make sure the files for the media library entries are available at the URLs mentioned in their <item> tags in the WXR file. Adapt the polyglot2polylang.rb script to modify these URLs accordingly, if necessary.
  9. Import the modified WordPress WXR file. When asked, check "Download and import file attachments."

The last step wil fail for WXR files with many attachments to download and import (incl. 500 Internal Server Error), except if your web server is configured to allow very long script execution times. You can either configure it to extend these execution times [instructions, of which step 1 is irrelevant now] or alternatively use this, rather clumsy, workaround (in analogy to what other people found):

  1. Use wxr-separate-attachments.rb (also in the to split your WXR file into attachments and content (posts and pages).
  2. Create a post (media file, page or post) with an ID that is higher than every ID that will be used by your imports. This can be done by creating some media file, then modifying its field "ID" in table wp_posts, via phpMyAdmin. This step is needed because WordPress will create a media file for every WXR file you upload for import, and if its ID is one that should be available to an imported post, the imported post's ID will get shifted instead, without any error message, and the IDs of all following imported posts likewise, producing total cofusion with respect to ID references in the WXR file.
  3. Import the WXR file with attachments into your WordPress blog as many times in a row until no error happens. After every import, some more media files will be successfully downloaded and imported. Of course, check "Download and import file attachments." every time.
  4. Import the WXR file with posts and pages.
  5. You will now have all content imported, but media library entries will appear as not attached to posts and pages [issue report]. This also happens when importing attachments after posts and pages. The solution is to create the attaching directly in the database, using phpMyAdmin to execute the SQL file <outfile>.attach.sql that was also created when you did run polyglot2polylang.rb.

Limitations

  • Workaround for title translations getting lost. As documented in the script, the WXR export will not contain Polyglot markup tags in the post and page titles, so these translations get lost during this process. You can however get the original titles as an export from your database (for example by exporting a one-column result set to CSV in phpMyAdmin) and then use an adapted version of depolyglot.rb to create SQL statements that will convert your artificially-unique titles back to the correct translations that did get lost.
  • Changing language-specific slugs. After this process, you will have the original post slugs for one language version and other language versions with an appendix ("-italiano" in the unmodified scripts). The polyglot2polylang.rb script generates another SQL file <outfile>.attach.sql that you can edit and execute to adapt these more to your liking, by doing proper translations. The original slugs are better not changed this way to keep URL compatibility with existing links, but you can edit them in your WordPress backend – WordPress then creates a 302 forwarder for the original one, and this also gets saved when backing up to a WXR file.
  • Better interface with the database directly. The whole process of transition is quite a complex, and further complicated by title translations getting lost in an WXR export, and timeouts of the WXR import. For that reason, if I had to re-do this task, I would write a script that directly operates on the WordPress SQL database. The database has a clear structure, and I'd rather like to deal with that than all these hacks. If there's no Ruby installed on the server, the script can also run locally and access the database with a remote connection. And, when enabled with some option, it could even ask the user for interactibely correcting slug names etc..

Further Information


Posted

in

,

by

Tags:

Comments

3 responses to “How to migrate to WordPress Polylang?”

  1. […] Most importantly, a new way to manage multiple languages. Most of my posts are both in Italian and English. So far, I managed this with a plugin called Polyglot, which still works but is not maintained anymore. The new system is based on Polylang; it buys me a more usable back end, a cleaner database and a better control on RSS feeds, that are now completely separated. As a result, I now have an English-language-only feed. The solution was found and implemented by sterling-silver hacker Matthias Ansorg. The migration script from Polyglot to Polylang is documented here. […]

  2. […] Soprattutto, un nuovo modo di gestire le lingue, cosa importante perché la grande maggioranza dei miei post è sia in italiano che in inglese. Finora ho gestito la cosa con un plugin chiamato Polyglot, che però non viene più mantenuto. Il nuovo sistema si basa su Polylang; mi consente un backend più usabile, un database molto più pulito e soprattuto un miglior controllo sui feed RSS, che a questo punto sono completamente separati. La soluzione è stata trovata e implementata da Matthias Ansorg, un hacker a 24 carati. La migrazione da Polyglot (che inserisce tag html nel corpo del post) a Polylang è documentata qui. […]

  3. […] I take profit of this post to remind that Matt already wrote a set of ruby scripts to convert a WordPress multilingual site using Polyglot to Polylang. […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.