Localize.drupal.org to come to life, so what about packaging?

Finally, the promise of a centralized localization interface for Drupal modules and themes looks to be coming true. I've started work on this project around two years ago under Google Summer of Code sponsorship and was continuing maintenance and improvements ever since. While I was spreading the word on it, not many people signed up to help clean up some possible performance problems, so it did not make into Drupal.org yet.

However, earlier this year I've got reviews from some key people in the infrastructure team, especially Gerhard Killesreiter, who persuaded me that setting this up is more important then it not being perfect yet. Software is evolving matter anyway, and we should improve as we see the problems. So I've started to set up localize.drupal.org. While we work out some of the kinks like single sign-on with drupal.org (one of the promises of the drupal.org redesign which will be delivered here), I thought it would be a good idea to discuss the implications.

So why do we need a localization server?

To recap, the underlying issue prompting us to set up a web interface for localization is manyfold:

  • We can increase participation in translation by a huge deal if we don't require translators to use CVS (or don't depend on module maintainers to commit translators on the submitters behalf). Also, translators now need to generate their translation templates themselves, since the templates generated by module maintainers are outdated. We can eliminate most (in many cases all) of the gettext tools too and get translators to just focus on the text.
  • When a module is released, translations are currently packaged with that. Since modules usually make string changes late in the release cycle as well, and usually give no previous heads up to translators on the release, the translations are not scheduled with the releases. Translators work after the fact, and their work is only delivered to users when a new release of the module is tagged and built.
  • Although Drupal itself shares translations of strings among modules, the gettext file based solution does not allow for this, except if translators merge the strings directly. It is easily possible that modules translate strings differently or that translators need to translate strings again.
  • CVS does not provide a submission and review workflow for translations. Unlike module code and patches, there is no way to review translation changes and maintain an (optional) approval workflow as it applies to module code. Translation updates are rarely submitted as patches, and asking translators to submit patches would just widen the toolset they should use again.

With all the disadvantages of the CVS and gettext .po file based translation system, we should still maintain a system where the transport mechanism is .po files, since that is the file format used by all versions of Drupal (including 7) to import and export translations.

So the localization server solves the above problems by using a centralized database of translations where equal strings and their translations are shared among modules (just like in Drupal), there is an optional approval workflow where moderators can approve suggestions or let certain or all people directly change translations of strings. It also provides a web service interface on top of this for Localization client users to submit translations directly from their workflow of translating English strings on their own site.

Since it will run with a single sign-on setup with drupal.org, anybody with a drupal.org account will be able to log in (not yet), and contribute. How's that for lowering the barrier from CVS and gettext tools?

What happens to packaging when not in CVS?

Thus the localization of projects on drupal.org will be removed from CVS, .pot and .po files will not be hosted there, but instead generated and packaged independently from the database of localize.drupal.org. And now comes the task not solved yet. Projects have branches and releases in those branches, so let's say a module has module-6.x-1.x and module-6.x-2.x branches with releases like module-6.x-1.0, module-6.x-1.4 and module-6.x-2.3. The different releases can have different sets of strings, even on the same branch. This is all nicely flattened on the localization server user interface, so people can filter for specific versions or just translate all versions.

But when thinking of the translation packages for users, we'd need to support translation packages for at least the latest stable versions of these branches updated as changes to translations are made. Think module-6.x-1.4-translations-1 which could become module-6.x-1.4-translations-2 when updated (if we think incrementing version numbers) or module-6.x-1.4-translations-20090730 which would become module-6.x-1.4-translations-20090806 next week (if we think timestamped snapshots of translations each week). There'd also be module-6.x-2.3-translations-X packages as well.

I'm assuming we can leave users of older stable versions without updated translations, since they do not update their code for the bugfixes either. This limits the number of "translation branches" to the number of branches the module has.

These branches need versioning however. We can either do snapshotting on a given time period or let the translation teams push new versions for projects when the time comes. However, since packaging each language per each project relase would be a diastrous amount of pacakages, keeping all translations for a module release in one versioned package seems more logical. So all language teams would release an updated translation at the same time. While this might sound limiting, it is still way ahead of the current situation where a module maintainer chooses the release date.

Ok, then how do site maintainers update translations?

Let's consider Drupal 7 first. If we think translations as separate packages from modules, themes and install profiles, then we'd need some kind of container for them in the Drupal file system. A top level "translations" directory maybe. As long as we version translations (and we have .info files for them), we can rely on update status module to provide update information and then people with tools like Drush or Plugin manager can update their translations that way. Translations are special among Drupal projects in that they only serve as a transport mechanism for database data, there is no living code, so we'd only use the packaging and updating infrastructure to facilitate the versioning. What might sound tricky here, is that we'd always need to grab the latest version of translations for the versions of Drupal modules we use. Great relief is that Drupal 7 just started to support version level dependencies today, so we can say what module version our translation is dependent on, so the right one can be picked.

What happens to Drupal 6? One option is that we keep supporting the CVS based translation interface for Drupal 6, but that would require that people actually commit translations from time to time from the localize.drupal.org database to the CVS repository (individually per project release). That sounds quite painful, so we can maybe take a queue from the Drupal 7 ideas (which were by the way based on the clever way how Features package their contents and how Drupal 7 install profiles provide their dependencies). So we can pretend that translations are of some type of project Drupal supports (probably a module) and build a glue module which would hide them on the modules page and support version level dependency for them on Drupal 6. We could then roll this system out incrementally, and migrate multilingual sites over to this system. We can combine this with letting people commit .po files to CVS, so those not willing to migrate to a new system to get translations will get some stuff, but will probably be quickly left out in the cold, if the new system proves to be as convinient as it looks like.

Let's talk!

Ok, this might sound a bit crazy, but we are reinventing how translations work and while I do not have a live localize.drupal.org instance to guide you through to get more background, we should figure this out, while Drupal 7 is open for development, so we have an updated translation deployment in place. There are numerous existing localization server instances on the internet used by various the translation teams, so you can check those out for now.

Wouldn't you like to just select from a list of languages pulled from a remote web service on Drupal installation (if an internet connection is available) and get Drupal download the translations you picked for you? Would you like automated translation updates when modules update (which does not happen in Drupal 6)?

Let's reality check the above plan so we can run down on this path and deliver streamlined localization support in Drupal 7 (and maybe even backport many of the goodness to Drupal 6 via contributed modules)!


Boris Mann's picture

The DA budget includes a line item for bundling language-specific downloads of Drupal. Sounds like this is part of that. So, I would aim for that as the short term target, and know that there is some budget available.

I agree that hiding Translations in a "module wrapper" is probably the best way to handle it. Extra code in update.php to handle translation upgrades? e.g. a list of upstream updated strings vs. locally changed, accept/merge/etc.

Psicomante's picture

Great news for Drupal Translation teams! Wait for any update ;)

Michael Prasuhn's picture

Maybe I'm missing something here, but if the translations exist in the DB on localize.drupal.org and then end up in the DB of a given Drupal site – then why not just expose the translations via web services and build a module that will import them into the local DB from localize.drupal.org?

This seems to make much more sense to me, as all this extra packaging for 'data' seems redundant. It seems most everyone is talking the issue of downloading and updating module from within Drupal, which is a potential security issue because of code, but translation aren't code – just data – so it seems something like an 'auto download/import' feature would be a perfect fit here.

Gábor Hojtsy's picture

1. We definitely need packaged translations for the reasons mentioned by Boris for example: packaging language distributions so people can start off easily.

2. Not everyone's Drupal instance have a live connection to the internet. Assuming that people would just be able to download and update via a web service does not stand.

3. Serious sites would not update translations via a webservice but use development and staging servers. Data stored in the database is already an issue for updates from the development to the staging to the live servers. So data stored on the file system is generally preferred. See Feature module's approach for example packaging exportables, so you can bundle Views, CCK fields, etc as packages and use in a deployment workflow.

4. We already have infrastructure to version packages, handle when they need updates and update them with Drush (command line), Aegir (outside of Drupal) and Plugin manager (inside of Drupal), so coming up with a custom way would not fit with all the existing tools we have, resulting in need for even more development.

sDaniel's picture

All valid reasons but I am pretty sure that those reasons don't apply to most Drupal sites as most of the Drupal users don't built "serious sites". So while we do need packaged translations in first place for most Drupal users a "click here to get the latest translation" solution or something similar would probably be most desirable.

Jose Reyero's picture

Starting from this premise, that translations are not code there are things we don't really need, like enabling more write permissions to get translations downloaded and imported.

Also, as you point out, cvs versioning is very little use here so I'd lean more to simple files and some meta information about updates. We can save all the cvs packaging overhead.

Something else I wouldn't like to be required for automatically updating translations is the plugin manager. You can call me paranoid but no way I am writing my ssh key in a web page just to get my translations updated

How I think it should work is actually already implemented in latest Open Atrium version: Translations are automatically downloaded by the installer upon request and there's a 'Drupal core update'- like system in place for updating them.

Really, this is (live, ready to use) all about my proposal for Drupal translations: http://github.com/developmentseed/Atrium/tree/master

Gábor Hojtsy's picture

I'm not thinking plugin manager, I'm thinking all the Drupal tools already available like Aegir, Drush, etc, including plugin manager. If we build a parallel system, translations will be "alien" to these tools, and they'd need special handling for those as well. If you don't like plugin manager, ask how your system matches to Drush and Aegir.

One great achievement of the Drupal 6 translation system was that we are not dealing with a huge blob of $languagecode.po anymore, which many shared hosts cannot even import at once due to PHP runtime limits. The translations are broken down by Drupal package and then under packages, they are broken down by submodules. People often use huge module sets like Organic groups or Ubercart, while they only actually use some parts of it, so we do not import all translations of all submodules. I think it would be a pity to loose this great optimization (both in terms of site performance and translator performance, since with the D6 method, you don't see any strings for modules you don't use).

So a central system like your suggested one when implemented for all of Drupal contrib would need to handle requests like: "I need the latest Hungarian translation for the 6.x-1.3 version of og_panels from the organic groups package". Going through all the languages of your site and submodules of projects with webservice requests or doing all this in one webservice request sounds like huge amounts of totally custom queries to run in any given time by the service, which does not sound like scalable to all the users who would tap into it. So I think it would be natural to think of bigger but static packages and let individual sites figure out what they need from that exactly. (Not unlike how modules and themes themselves are distributed.) That would distribute the load to the sites themselves, while the central service can just spit out static files which can be served with a server with low resource consumption. Then the clients can reuse the exact update_status (not a custom copy of that), Drush, Aegir and all that.

Jose Reyero's picture

Though reusing existing tools sounds appealing, all these tools need one way or another of shell access. While for downloading and dumping data into the db we don't need it at all.

I think if it takes building new tools to save end users from extra requirements, we just build new tools (which anyway are already built).

Think of publishing plain po files. Then you don't even need uncompressing tar.gz packages on the client side. More: you don't even need downloading the file at all, you can open the remote file and import.

So what I see is that by reusing the existing tools (and anyway we'd need to build some more to handle translation packaging) we are adding new requirements to client sites just to update the translations, when we could be saving a lot of people a lot of trouble...

akisp's picture

I'm an end user and translator.

gallery 3 is on development at this time.
It seems that uses your approach, if not the code itself in order to solve
their localization issue.
If you can take a look on it would be nice.

They used more hard-coded method than drupal before this rebuild!

Now an end user has a choice on admin panel if he wants to start the translation in his language.
And a second choice where is asked if he wants to send the translation back to gallery 3 site in order for them to be available to others.

Τhe translations at this time are made available to end users by selecting their languages on a panel and by clicking on update.
Witch brings to them the available translations for the core system and the extra enabled modules.

While I was spreading the word on it, not many people signed up to help clean up some possible performance problems, so it did not make into Drupal.org yet.

As per the above quote i need an explanation:
Are you referring on the linked page in your site or on the project page?
Because i haven't seen something like the above article on the first page
on the drupal.org.
Where i believe that should be published!

andypost's picture

please don't forget about that most multi-language sites have own translation that depends on "subject area" of it's nature so direct translation updates on live sites are NOT possible.

On other hand at development stage it's good to download only translations that are needed. By this way the size of downloaded modules will be reduced, I think sometimes in times!

Another idea is to store modified translation of site strings for local projects on local l10n_server setup and have ability to sync them.

bmagalhaes's picture

There is a project http://drupal.org/project/live_translation, why the module don´t have integration with localize server?

It´s very good to who manager many sites.

Sorry for my bad english.

Bruno de Oliveira Magalhães

Trout's picture

Gábor, you should check out this localization service: https://poeditor.com/. It's perfect for collaborative translations and it's nicer to use than many other service I've been trying out.

Add new comment