30 Jan 2013

Supporting (X)GETTEXT on Windows - From the Ground Up

Background Info

So, you may have fiddled with GETTEXT in your php files or you may have used PoEdit to translate an app or a site. Well, that's pretty much as far as you can go with these unless you want to go further by using XGETTEXT on the command line to produce your own .pot file for distribution (or your own use, come to that).
If you read a previous article: Creating .pot Files in PoEdit for Translators, you'll remember that PoEdit can create these for you - well, it can, but not fully-featured .pot files. PoEdit has been developed to create and edit .po files, not .pot files, which I got from the horse's mouth, author Václav. Slavik. So what's the difference?

.pot vs .po Files

.pot files are meant as templates for producing other localizations and shouldn't be used to create the all important .mo files. W e can use Gnu's (X)GETTEXT.

Get XGETTEXT for Windows

XGETTEXT works off the Unix command line, so it can't be run natively on Windows, but the clever people at Gnu have provided a Windows version here - GnuWin32.
Follow the link, download and install the package.
You may install it anywhere, but don't place it in the root directory. When installed, you may have something like the image on the right.
Congratulations, you've now installed XGETTEXT. Well. Almost.
In order to get ease of use out of XGETTEXT, you'll need to make the executable commands available from everywhere, other than just the gnuwin32/bin/ directory. For this, we need to add a path to our Windows Environment Variables.

Add a Path to gnuwin32

  1. Right-click on Computer (or My Computer) in the Start Menu choose Properties in the displayed shortcut menu.
  2. On the Properties dialog, click on the Advanced Systems Settings link:
  3. Then, press the Environment Variables... button on the Advanced tab.
  4. Then on the next dialog, choose Path in the System Variables box (or use User variables, if you only want to add XGETTEXT for yourself). Then press Edit...
  5. Now add a path to the directory where the gettext executables live. For me it was: C:\Program Files (x86)\gnuwin32\bin, but your installation may be different. You must separate this path from the last existing one with a semi-colon (;):
  6. Ok, now we should be done with setting up XGETTEXT on Windows - we just need to test it, so open up the command prompt (cmd.exe) and lets type:
    C:\> xgettext --help
    You should then see something that ends like this:

XGETTEXT Commands and Options

I've been informed on many a website that the XGETTEXT documentation is excellent. I really don't want to disagree, but unless you're a unix-junkie and have experience in decoding hieroglyphics, you won't find this easy-to-follow documentation. It gave me a nosebleed! Mind you, it may say more about me than the documentation. So, how do you get it to work? Well, it's surprisingly easy if you ignore the --help menu, which by the way, also makes very little sense.
You may think that I'm being a bit unfair, and yes, I suppose I am, since once you understand the various options and the syntax, it does kind of make sense.

Examples of Command Line Statements to Build a (.pot) File

Before I just give a blind example, I'll run through the basic structure:
ElementDescription
Set Directories-D
e.g. -D c:/xampp/htdocs/timetabler/includes/*.php c:/xampp/htdocs/timetabler/mainsite/*.php
This describes a list of all the directories that hold files that contain your gettext strings. Directories are separated by a space.
Adding translator comments -c
A plain -c will add all php comments that precede the gettext strings as 'Notes for Translators'.
-c[tag]
e.g. -cTRANSLATORS
You may add a tag to this which will only extract the comments starting with this string.
You should note that successive comments between this and gettext strings will be included too. The tag option is useful if you have general comments directly preceding your specific translator comments. This prevents them from being included.
Sorting strings by file location -F
This will group gettext strings according to directories.
Search for strings -k[keywordspec]
e.g. -k__
This gets the keywords, like a prefix, for which to search to identify gettext strings, e.g. __ refers to something like echo _('hello'); so the __ means _(...).
Apply encoding --from-code=name
e.g. --from-code=UTF-8
You should use this if your files may contain non-ASCII text.
File language -L langname
e.g. -L php
This helps xgettext to know how to parse the files containing the gettext strings and comments.
Output location -p directory
e.g.1 -p c:/xampp/htdocs/timetabler/lang (absolute path)
e.g.2 -p timetabler/lang (relative reference)
This sets the directory for saving the output .po file. If this is not included, the file will be saved to the relative directory from where the xgettext command is run.
Output filename -o filename
e.g. -o mytemplate
The setting above will create a file called mytemplate.po.
Add filename and line info-n
This generates the #: filename: line lines in the .pot file, which can be displayed in PoEdit by right-clicking the string to be translated.
So armed with the codes above, you should now be able to enter something like the following at the command prompt:
C:\> xgettext -n -c -D c:/xampp/htdocs/diafol/includes/*.php c:/xampp/htdocs/diafol/config/*.php --from-code=UTF-8 -k__ -L php -p c:/xampp/htdocs/diafol/lang/orig -o mytemplate.pot
Breaking down this into parts, we're trying to:
  1. Add filename lines to the output file (-n).
  2. Add translator comments to the output file -c. No particular tag.
  3. GETTEXT strings can be found in files with .php extension in c:/xampp/htdocs/diafol/includes/ and c:/xampp/htdocs/diafol/config/ directories.
  4. Fix the encoding to UTF-8 --from-code=UTF-8
  5. Search the .php files for the _(...) format strings (-k__).
  6. Let XGETTEXT know that you're using input files using the php language (-L php).
  7. Output the new file to the c:/xampp/htdocs/diafol/lang/orig directory.
  8. Name the file mytemplate.pot (-o mytemplate.pot)

Plural Forms

One thing that I have not covered is the 'plural forms' aspect. A simple case could be with regard to something like: '1 click' or '2 clicks', where there are a number of formats depending on how many subjects that you have. This can be quite complicated as different languages will have different plural forms. For example, English is relatively simple: 'click' singular would be applied to just the digit 1, but all others 0,2,3,4,5... as 'clicks', but other languages may have a very different pattern.
So how do we go about treating plural forms properly? It seems that we have to get a bit crafty in our source files, using the ngettext keyword. I have to admit, I struggled to find decent resources on this with php, but with a bit of fiddling, I believe that I managed to find a working method.
If XGETTEXT knows that we're working with php files, it should automatically pick up the ngettext keyword, so DON'T include this as a keyword in the command line. If you do, you may find that you produce a single string entry for translation and not a plural entry.
Here's a trivial example of ngettext() in a source file:
printf(ngettext("%d file", "%d files", $num), $num);
Once you've run your XGETETXT in the command line, open up the new .pot in a text editor and you should see something like this:
#: c:/xampp/htdocs/timetabler/includes/step1.php:20
#, php-format
msgid "%d file"
msgid_plural "%d files"
msgstr[0] ""
msgstr[1] ""
As far as manipulating creating the .pot file via the command line, we're done, but let's look at a better example. The following code is pretty poor with regard to php, but it's just for illustration purposes, with the focus on ngettext() and _().
<?php
//temp display variables
$num= 1;
$username= "diafol";
$email = "diafol@example.com";

//Form label
$user_label = _("Username:");
//Form label
$email_label = _("Email:");
//Form button label
$submit_label = _("Change Email");
//You must include %d as is 
$lastvisit = ngettext("Your last visit was %d day ago", "Your last visit was %d days ago:", $num);
?>
  
<p><?php printf($lastvisit,$num);?></p>   
<form id="profile" name="profile" action="scripts/profile_handler.php" method="post">
 <label for="username"><?php echo $user_label;?></label>
 <input id="username" name="username" value="<?php echo $username;?>" disabled="disabled" />
    <label for="chemail"><?php echo $email_label;?></label>
 <input id="chemail" name="chemail" value="<?php echo $email;?>" />
    <input type="submit" name="submitprofile" id="submitprofile" value="<?php echo $submit_label;?>" />
</form>
This code gives the following form:
If we change the $num variable to:
//temp display variables
$num= 7;
You should see:
So, we can confirm that the printf()/ngettext() is working as expected in the native language. Okay, here comes the magic bit again, lets run our XGETTEXT:
xgettext -n -c -D c:/xampp/htdocs/timetabler/includes/*.php c:/xampp/htdocs/
timetabler/config/*.php --from-code=UTF-8 -k__  -L php -p c:/xampp/htdocs/timetabler/lang -o messages.pot
This should produce a file messages.pot with something like the following content:
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2013-01-30 00:18+0000\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=INTEGER; plural=EXPRESSION;\n"

#. Form label
#: c:/xampp/htdocs/timetabler/includes/step1.php:8
msgid "Username:"
msgstr ""

#. Form label
#: c:/xampp/htdocs/timetabler/includes/step1.php:10
msgid "Email:"
msgstr ""

#. Form button label
#: c:/xampp/htdocs/timetabler/includes/step1.php:12
msgid "Change Email"
msgstr ""

#. You must include %d as is
#: c:/xampp/htdocs/timetabler/includes/step1.php:14
#, php-format
msgid "Your last visit was %d day ago"
msgid_plural "Your last visit was %d days ago:"
msgstr[0] ""
msgstr[1] ""
We can see the 'Notes for Translators' comments with the #.-prefixed lines, the filename: line comments, prefixed by #: and the plural format with the msgid_plural key.
We are now ready to edit the file in order to make it distributable. We should edit certain parts of the messages.pot file directly.

Editing Parts of the .pot file Directly

Open up the messages.pot file in a text editor and apply your info, for example, the default:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
Could be changed to this:
# Translation File for diafolCode TIMETABLER. Download the package from http://www.example.com/downloads).
# Copyright (C) 2013 ALAN DAVIES
# This file is distributed under the same license as the diafolCode TIMETABLER package.
# ALAN DAVIES <diafol@example.com>, 2013.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: diafolCode Timetabler 1.01b\n"
"Report-Msgid-Bugs-To: http://www.diafolcode-bugs.example.com\n"
Finally, we've produced our finished messages.pot file. It is now ready for distribution. Either as a downloadable .pot file, or as part of the whole package, if this is to be distributed.

6 comments:

  1. Great tutorial but gettext 0.14.4 is a very old release ! I don't know how to install latest release with GnuWin32 (in order to use context).

    ReplyDelete
  2. Hi Julien - thanks for commenting. I had a look at the newest release of GnuWin32, but I couldn't find a newer version of gettext than 0.14.4 in it. Do you know of one?

    I went here: http://gnuwin32.sourceforge.net/packages.html

    ReplyDelete
  3. You can get any version of gettext for Windows from http://ftp.gnome.org/pub/gnome/binaries/win32/dependencies/
    To install it you need to download two zip files (tools and binaries). In each zip file there is a /bin folder. Extract both bin folders into a folder that is in your Windows PATH and you are done. You don't need the other files in the ZIPs.

    ReplyDelete
    Replies
    1. You need the tools and runtime files (not tools and binaries). For example, for version 0.17 download: http://ftp.gnome.org/pub/gnome/binaries/win32/dependencies/gettext-runtime-0.17-1.zip and http://ftp.gnome.org/pub/gnome/binaries/win32/dependencies/gettext-tools-0.17.zip

      Delete
  4. Best of luck for your next blog. 🙂 Well thank you again for sharing it.

    ReplyDelete
  5. amazing write , keep posting and if you are intresting in big data coder and code developer then checkout java classes in satara

    ReplyDelete