Internationalizing a static site with PHP and nginx

Internationalizing a static site with  PHP and nginx

I'm getting married (yay!) and a majority of the guests speak some combination of Swedish, Nepali, or English.

I thought it would be nifty to have a multi-lingual wedding site, where the same URL would present different language versions of the same content based on the user.

Here's how I did it.[1]

Tooling choices

I decided to use nginx for the web server. I've been converting from Apache over the last few years and I figure this would be a great use case.

I decided on PHP for scripting because I wanted something that can stitch HTML files together and deal with a few server variables, nothing fancier.

I chose GNU gettext to handle the various translation files, because it's fully compatible with poedit.[2]

Finally, the translation strings themselves were provided by friends and family who speak Nepali or Swedish.

Selecting a language

First I had to decide on a mechanism to select a language (other than URL path) and GeoIP wasn't feasible because the geography of our guests wasn't a good predictor their language preference.

Luckily, there's the HTTP Accept-Language header, which specifies what languages a user agent accepts.

Of course, this isn't always accurate, so I wanted to leave a way of manually selecting a language, and remembering that preference.

And finally, for testing, I wanted to be able to force a particular language to be used no matter what.

In the end, I decided on three separate methods of selection, along with the order that they should be checked:

  • URL query parameter
  • Saved user preference (via cookie)
  • The Accept-Language header

So, if you've never been to the site, you'll get a translation based on your Accept-Language header. If that's wrong, you can set your preference and it'll persist in a cookie. And if you want to override these and force a language, a URL parameter will do it.

For consistency, I decided that the cookie and the URL parameter should both be called "locale".

Accept-Language Header note

The header has a format like this:

Accept-Language: en;q=0.7

The q parameter is the quality preference (from 0 to 1) of the user wanting content in that language, but really it's a best-guess at the user's language preferences.

Rather than going crazy with this parameter, I decided the presence of a particular language code means that the user will find it pleasantly surprising to get content in that language 😊

Forcing PHP translation

When requesting any HTML page, I wanted to have a PHP script evaluate the proper locale first and do the translation server-side.

This meant all HTML request traffic has to be routed to a single PHP script, and that script, in turn, has to be given the user language preference.

Using FastCGI, I put the following into my nginx configuration:

location ~ [^/]\.htm(l)(/|$)  {
     fastcgi_param PREF_LANGUAGE $pref_language;
     include fastcgi_params;
     fastcgi_param SCRIPT_FILENAME router.php;
     fastcgi_pass unix:/var/run/php5-fpm.sock;
  }

This forces all .htm and .html locations to go through the router.php file, which will recieve the request URI as $_SERVER['REQUEST_URI'] and parse/route it accordingly.

The user's language preference will also be passed along to PHP as $_SERVER['PREF_LANGUAGE']

Cacheing

I don't want to repeatedly call PHP to render static pages, so I decided to use nginx's built-in cacheing as much as I could.

Every URI being routed by PHP will need to be cached to multiple addresses based on language preference, but everything else about the request (including URL parameters) can be ignored for cacheing purposes.

This means the ideal cacheing scheme is just:

fastcgi_cache_key "$uri$pref_language";

Defining the user pref variable

Unfortunately, there is no magic variable called $pref_language in nginx 😀

We need to define it, and due to the limitations of nginx's if-statement, it looks like this: [3]

set $pref_language 'en_US';
if ($http_accept_language ~* 'sv') {
    set $pref_language 'sv_SE';
}
if ($http_accept_language ~* 'ne'){
    set $pref_language 'ne_NP';
}
if ($cookie_locale ~* 'sv'){
    set $pref_language 'sv_SE';
}
if ($cookie_locale ~* 'ne'){
    set $pref_language 'ne_NP';
}
if ($arg_locale ~* 'sv'){
    set $pref_language 'sv_SE';
}
if ($arg_locale ~* 'ne'){
    set $pref_language 'ne_NP';
}

This sets a reasonable default (en_US) and then checks the Accept-Language header, followed by the 'locale' cookie, and finally the 'locale' URL query parameter for any supported languages.

If it finds any, that language is set as the language preference, otherwise the default is passed along to PHP.

Server-side translation

Finally, we need to get the server to pull up the proper translation and send it back.

First, PHP needs to have gettext defined, so the following line (or something close to it) needs to be in the php.ini file:

extension=gettext.so

Now, for the PHP routing file itself.

Checking for a valid locale and using the $pref_language variable we defined above is simple enough:

$valid_locales = ['en_US', 'sv_SE', 'ne_NP'];
$locale = 'en_US';
if (in_array($_SERVER['PREF_LANGUAGE'], $valid_locales)){
  $locale = $_SERVER['PREF_LANGUAGE'];
}

The second part is a bit trickier because gettext is a little quirky and isn't easy to debug.

As of this writing, PHP's Linux implementation of gettext requires you to actually have the desired target locale installed on your machine. If you don't, everything will appear to be running fine, but you will never see any translated text.

I looked into some of the workarounds, but in the end, the fastest way was to just install the locales. You can do this with the locale-gen command. I installed both the regular locales (ne_NP, sv_SE) and their UTF-8 versions (ne_NP.utf8, sv_SE.utf8) just to be safe.

After that, the PHP file should include code to switch to any of the now supported locales.

putenv("LANG=" . $locale);
putenv("LANGUAGE=" . $locale);
setlocale(LC_ALL, $locale);

Finally, gettext needs to know the location of the .MO files compiled from poedit.[4]

In the following example, the folder structure is assumed to look like this:

./
├── router.php
└── Locale/
    ├── en_US/
    |   └── LC_MESSAGES/
    |       └── messages.mo
    ├── ne_NP/
    |   └── LC_MESSAGES/
    |       └── messages.mo
    └── sv_SE/
        └── LC_MESSAGES/
            └── messages.mo

The encoding is also assumed to be UTF-8, which can be set in poedit.

Given the above, the .MO files are specified with the following code:

$folder_name = "Locale";
$translation_file = "messages";
bindtextdomain($translation_file, $folder_name);
bind_textdomain_codeset($translation_file, 'UTF-8');
textdomain($translation_file);

From this point onwards, gettext will be aliased to the _() function, meaning _("Text") will evaluate out to the translation of the string "Text" in the desired locale assuming everything went well.

Code

Unfortunately, I bought a proprietary (but very nice) theme for the skeleton of the site, so I can't share the full code in a public repo. If there's interest, though, I can check in some of the example configs shown here.

Hope this helps some others wandering into internationalization territory!



  1. My code was much more redundant and complicated; I've simplified here so others can get a more usable example. ↩︎

  2. Poedit is a program for non-technical people to be able to generate .MO and .PO translation files and is way better than giving your friends and family a random list of strings and asking them to translate it. ↩︎

  3. It's much better to define this variable in PHP, but it needs to (at least partially) be defined in nginx for cacheing as well. My actual version of this code does it in a less monolithic if-statement way, but was simplified for clarity here. ↩︎

  4. You can compile the messages yourself with the msgfmt command as well. ↩︎