List Info

Thread: updating translations: how valuable is user data after all?




updating translations: how valuable is user data after all?
user name
2007-05-22 04:53:19
Hi guys,

Now Drupal 6.x-dev includes cool features to import PO files

automatically at every logical step:

  - you can install Drupal in your foreign language, and
have
    PO files for all enabled modules imported along
automatically

  - when you add a new language, all translation files for
    enabled modules get imported automatically for that
language

  - when you install new modules or enable themes, the
translation
    files for these components get imported for all enabled
    languages

This is all great and automated, contrib modules already
have their PO 
files at the right place, and we will update the packaging
scripts for 
Drupal 6 to package core translations properly.

You might notice a pattern in the above features though:
they IMPORT 
stuff into the database. Unfortunately we have no way in
Drupal 6 to 
remove translations when you disable a theme or uninstall a
module. We 
don't know what strings appeared in *only* that component,
and not 
elsewhere in Drupal, so we can remove them without problems.
For that, 
we would need the extractor script built into Drupal core to
look 
through all source files of enabled components and identify
the unused 
strings in the database. Fortunately this is doable in
contrib, now that 
extractor has it's own Drupal module. (Of course it is
doable in Drupal 
core my deleting all strings from the database and
reimporting files for 
only the enabled components, but read on about the value of
user data).

BTW Drupal 6 core still need upgrade support for
translations. So when 
you update a module or Drupal itself, new and corrected
translations get 
into your database. New translations are easy again, they
are just 
importing new stuff, which we are very good at  Updating
translations 
already in the DB threatens user data though. In Drupal 5
and before, we 
have no information about what translations a user modified
on the web 
interface, so we don't know what was imported from available
PO files 
and what was user defined. We can reimport stuff from the
files, but can 
easily loose/overwrite user defined/updated strings.

What can we do about not to loose user defined strings? We
can easily 
introduce a 'modified' bit into the locale translations
(target) table, 
just as it was in menu module in Drupal 5. That would help
us from 
Drupal 6 onward, but it does not help us loosing user
defined strings 
when a Drupal 5 to Drupal 6 upgrade happens. So how cautious
should we 
be there?

   1. Do not overwrite any existing translation, risking
that we leave
   incorrect and fixed translations in the database.

   2. Do overwrite existing translations on an update,
risking that
   we overwrite user modified translations.

Note that an update will not *remove* anything from the DB
because we 
don't know what we can remove as explained above. It can
*overwrite* 
stuff though, and problems are around these overwrites.

So how should the update paths work for Drupal and for
modules/themes?

Gabor

Re: updating translations: how valuable is user data after all?
user name
2007-05-22 14:34:58
What if the update script compared the PO imported strings to the ones in the database, if they are identical it would be marked as "not modified" in the newly added modified bit, else, it would be marked as modified. Would that work?

On 5/22/07, Gabor Hojtsy < gaborhojtsy.hu">gaborhojtsy.hu> wrote:
Hi guys,

Now Drupal 6.x-dev includes cool features to import PO files
automatically at every logical step:

&nbsp; - you can install Drupal in your foreign language, and have
 ; &nbsp; PO files for all enabled modules imported along automatically

&nbsp; - when you add a new language, all translation files for
   ; enabled modules get imported automatically for that language

  ;- when you install new modules or enable themes, the translation
 &nbsp; &nbsp;files for these components get imported for all enabled
  ; &nbsp;languages

This is all great and automated, contrib modules already have their PO
files at the right place, and we will update the packaging scripts for
Drupal 6 to package core translations properly.

You might notice a pattern in the above features though: they IMPORT
stuff into the database. Unfortunately we have no way in Drupal 6 to
remove translations when you disable a theme or uninstall a module. We
don&#39;t know what strings appeared in *only* that component, and not
elsewhere in Drupal, so we can remove them without problems. For that,
we would need the extractor script built into Drupal core to look
through all source files of enabled components and identify the unused
strings in the database. Fortunately this is doable in contrib, now that
extractor has it's own Drupal module. (Of course it is doable in Drupal
core my deleting all strings from the database and reimporting files for
only the enabled components, but read on about the value of user data).

BTW Drupal 6 core still need upgrade support for translations. So when
you update a module or Drupal itself, new and corrected translations get
into your database. New translations are easy again, they are just
importing new stuff, which we are very good at Updating translations
already in the DB threatens user data though. In Drupal 5 and before, we
have no information about what translations a user modified on the web
interface, so we don't know what was imported from available PO files
and what was user defined. We can reimport stuff from the files, but can
easily loose/overwrite user defined/updated strings.

What can we do about not to loose user defined strings? We can easily
introduce a 'modified' bit into the locale translations (target) table,
just as it was in menu module in Drupal 5. That would help us from
Drupal 6 onward, but it does not help us loosing user defined strings
when a Drupal 5 to Drupal 6 upgrade happens. So how cautious should we
be there?

&nbsp;  1. Do not overwrite any existing translation, risking that we leave
&nbsp;  incorrect and fixed translations in the database.

 &nbsp; 2. Do overwrite existing translations on an update, risking that
 ;  we overwrite user modified translations.

Note that an update will not *remove* anything from the DB because we
don&#39;t know what we can remove as explained above. It can *overwrite*
stuff though, and problems are around these overwrites.

So how should the update paths work for Drupal and for modules/themes?

Gabor

Re: updating translations: how valuable is user data after all?
user name
2007-05-22 14:47:50
Ashraf Amayreh wrote:
> What if the update script compared the PO imported
strings to the ones 
> in the database, if they are identical it would be
marked as "not 
> modified" in the newly added modified bit, else,
it would be marked as 
> modified. Would that work?

Well, the idea of the modified bit would be to mark what you
modified on 
the web interface, so we can protect them from later
modifications. The 
fact that an updated translation contains modified strings
does not mean 
you ever touched the original ones. Keeping them in the
database would 
conserve the bad translations the teams try to update. This
reuse of the 
modified bit would be against the intended role of the
modified bit.

Of course we can reposition the suggested modified bit, but
then it will 
not serve a protection role, just some kind of notification
that the 
version in the system is different in the PO files. How
would that serve 
the user? How would users clean up this stuff, to
distinguish between 
translation team updated strings and real user modified
translations?

Gabor

Re: updating translations: how valuable is user data after all?
user name
2007-05-22 15:44:42
hmm... we would know of any web modified values if we could theoretically gather all the PO files that were used on a drupal 5 installation, right? I know little info about PO files so apologies if I'm making any wrong assumptions here. I've seen PO files distributed in modules which would make this work, but I don't know how applicable this is if an external PO file was used or so forth. Any ideas on applicability of this?

In worst case, doing this technique on existing PO files is still better as it will recognize some strings as "not modified&quot; rather than either considering all strings as "modified" or "not modified&quot;. It's still better to use what PO files exist to get the least error margine I guess.

Also, what if the modified int (not bit) had a third "unknown" value, where the user could go and resolve these words? I'm really just babbling some random thoughts here.

On 5/22/07, Gabor Hojtsy < gaborhojtsy.hu">gaborhojtsy.hu> wrote:
Ashraf Amayreh wrote:
>; What if the update script compared the PO imported strings to the ones
> in the database, if they are identical it would be marked as "not
> modified&quot; in the newly added modified bit, else, it would be marked as
> modified. Would that work?

Well, the idea of the modified bit would be to mark what you modified on
the web interface, so we can protect them from later modifications. The
fact that an updated translation contains modified strings does not mean
you ever touched the original ones. Keeping them in the database would
conserve the bad translations the teams try to update. This reuse of the
modified bit would be against the intended role of the modified bit.

Of course we can reposition the suggested modified bit, but then it will
not serve a protection role, just some kind of notification that the
version in the system is different in the PO files. How would that serve
the user? How would users clean up this stuff, to distinguish between
translation team updated strings and real user modified translations?

Gabor

Re: updating translations: how valuable is user data after all?
user name
2007-05-22 15:53:25
Ashraf Amayreh wrote:
> hmm... we would know of any web modified values if we
could 
> theoretically gather all the PO files that were used on
a drupal 5 
> installation, right? I know little info about PO files
so apologies if 
> I'm making any wrong assumptions here. I've seen PO
files distributed in 
> modules which would make this work, but I don't know
how applicable this 
> is if an external PO file was used or so forth. Any
ideas on 
> applicability of this?

When you update from Drupal 5 to Drupal 6, you throw away
your Drupal 5 
code and you get the new Drupal 6 code. We will try to
educate people to 
grab the core language pack too before the upgrade. The
problem is that 
we don't have the PO files used on the Drupal 5 site, if
any... Many 
people translate modules they have no translation for on the
web 
interface, because 'it is there'.

> In worst case, doing this technique on existing PO
files is still better 
> as it will recognize some strings as "not
modified" rather than either 
> considering all strings as "modified" or
"not modified". It's still 
> better to use what PO files exist to get the least
error margine I guess.

If only we would have the PO files used before. With most
Drupal 5 
sites, the PO files are not in the webroot, because it was
told to 
people to import translations by uploading a big PO file. So
they have 
the data in the DB but not likely that they have the exact
source PO file.

> Also, what if the modified int (not bit) had a third
"unknown" value, 
> where the user could go and resolve these words? I'm
really just 
> babbling some random thoughts here.

Hm, maybe we could do that for this interim period. Question
is what UI 
can help users here. It should be simple and quick. (We can
have a few 
languages with two dozen modules easily, which could result
in many 
"unknown state" translations).

Gabor

Re: updating translations: how valuable is user data after all?
user name
2007-05-22 16:12:16
As to replacing the files with the drupal 6 files. There's nothing preventing us from creating a two step scenario here, one would alter the table and set the flags on the initial drupal 5 installation, thus using the existing PO files (if any!). The other update would continue the processing (if needed) when the new drupal 6 files are in place.

So then the process would go like this:

1. Every string MATCHING an existing string in a PO file would be set to "not modified&quot;
2. Every string NOT MATCHING an existing string in a PO file would be set to "modified"
3. Every string that doesn't exist in a PO file would be set to "modified"

Actually, this really makes sense, the user who's upgrading will most likely be content with the translations as they appear on his drupal 5 site, so it would be most reasonable to set a non-matching/non-existant string as modified to avoid overwriting it in future PO file dumps.

By the way, how does drupal currently distinguish PO imported strings from web-modified strings? Or does it just overwrite everything on a PO dump?

On 5/22/07, Gabor Hojtsy < gaborhojtsy.hu">gaborhojtsy.hu> wrote:
Ashraf Amayreh wrote:
&gt; hmm... we would know of any web modified values if we could
> theoretically gather all the PO files that were used on a drupal 5
> installation, right? I know little info about PO files so apologies if
> I'm making any wrong assumptions here. I've seen PO files distributed in
> modules which would make this work, but I don't know how applicable this
> is if an external PO file was used or so forth. Any ideas on
> applicability of this?

When you update from Drupal 5 to Drupal 6, you throw away your Drupal 5
code and you get the new Drupal 6 code. We will try to educate people to
grab the core language pack too before the upgrade. The problem is that
we don't have the PO files used on the Drupal 5 site, if any... Many
people translate modules they have no translation for on the web
interface, because 'it is there'.

> In worst case, doing this technique on existing PO files is still better
&gt; as it will recognize some strings as "not modified&quot; rather than either
>; considering all strings as "modified" or "not modified&quot;. It's still
> better to use what PO files exist to get the least error margine I guess.

If only we would have the PO files used before. With most Drupal 5
sites, the PO files are not in the webroot, because it was told to
people to import translations by uploading a big PO file. So they have
the data in the DB but not likely that they have the exact source PO file.

>; Also, what if the modified int (not bit) had a third "unknown" value,
>; where the user could go and resolve these words? I'm really just
>; babbling some random thoughts here.

Hm, maybe we could do that for this interim period. Question is what UI
can help users here. It should be simple and quick. (We can have a few
languages with two dozen modules easily, which could result in many
&quot;unknown state"; translations).

Gabor

Re: updating translations: how valuable is user data after all?
user name
2007-05-22 17:08:42
Ashraf Amayreh wrote:
> As to replacing the files with the drupal 6 files.
There's nothing 
> preventing us from creating a two step scenario here,
one would alter 
> the table and set the flags on the initial drupal 5
installation, thus 
> using the existing PO files (if any!). The other update
would continue 
> the processing (if needed) when the new drupal 6 files
are in place.

Well, how many of your Drupal 5 sites you have PO files
handy? Even if 
you have, one of the reasons of introducing split (smaller)
PO files in 
Drupal 6 is that we cannot read and parse a big PO file all
at once if 
we are running under tight server resources (like a busy
server with a 
default PHP time limit).

> So then the process would go like this:
> 
> 1. Every string MATCHING an existing string in a PO
file would be set to 
> "not modified"
> 2. Every string NOT MATCHING an existing string in a PO
file would be 
> set to "modified"
> 3. Every string that doesn't exist in a PO file would
be set to "modified"
> 
> Actually, this really makes sense, the user who's
upgrading will most 
> likely be content with the translations as they appear
on his drupal 5 
> site, so it would be most reasonable to set a
non-matching/non-existant 
> string as modified to avoid overwriting it in future PO
file dumps.
> 
> By the way, how does drupal currently distinguish PO
imported strings 
> from web-modified strings? Or does it just overwrite
everything on a PO 
> dump?

There is nothing to distinguish them, this is our upgrade
problem, and 
will be our upgrade problem if we don't do something against
it.

Gabor

Re: updating translations: how valuable is user data after all?
user name
2007-05-22 17:50:00
The only two options I see in this case is either to give them a status of "unknown" or to somehow query the user for the behavior he wants during the update (either to preserve them, modified bit set, or to make them overridable, modified bit unset). How we can query a user during an update is a mystery to me.

I would go for creating a third status of unknown as this will open up other possibilities later on since they will at least be distinguishable from normal strings. It will also by some time if the code freeze is "frozen".

What we could do with them after that is open to suggestions, but I can think of a scenario where importing future PO files could check for their presence and then setting their status to "modified"/"not modified&quot; according to weather they match the PO file strings or not. There's really no totally clean way to solve this. Any assumptions on behalf of the user could bear real bad consequences.



On 5/23/07, Gabor Hojtsy < gaborhojtsy.hu">gaborhojtsy.hu> wrote:
Ashraf Amayreh wrote:
>; As to replacing the files with the drupal 6 files. There's nothing
&gt; preventing us from creating a two step scenario here, one would alter
> the table and set the flags on the initial drupal 5 installation, thus
>; using the existing PO files (if any!). The other update would continue
&gt; the processing (if needed) when the new drupal 6 files are in place.

Well, how many of your Drupal 5 sites you have PO files handy? Even if
you have, one of the reasons of introducing split (smaller) PO files in
Drupal 6 is that we cannot read and parse a big PO file all at once if
we are running under tight server resources (like a busy server with a
default PHP time limit).

&gt; So then the process would go like this:
>
> 1. Every string MATCHING an existing string in a PO file would be set to
> "not modified&quot;
> 2. Every string NOT MATCHING an existing string in a PO file would be
> set to "modified"
> 3. Every string that doesn't exist in a PO file would be set to "modified"
>
> Actually, this really makes sense, the user who's upgrading will most
>; likely be content with the translations as they appear on his drupal 5
> site, so it would be most reasonable to set a non-matching/non-existant
> string as modified to avoid overwriting it in future PO file dumps.
&gt;
> By the way, how does drupal currently distinguish PO imported strings
&gt; from web-modified strings? Or does it just overwrite everything on a PO
> dump?

There is nothing to distinguish them, this is our upgrade problem, and
will be our upgrade problem if we don't do something against it.

Gabor

Re: updating translations: how valuable is user data after all?
country flaguser name
Israel
2007-05-23 04:44:15
???? ????? 22 ??? 2007, 12:53, ???? ?? ??? GABOR HOJTSY:
> HI GUYS,
>
> NOW DRUPAL 6.X-DEV INCLUDES COOL FEATURES TO IMPORT PO
FILES
> AUTOMATICALLY AT EVERY LOGICAL STEP:
>
> [SNIP...]
>
> NOTE THAT AN UPDATE WILL NOT *REMOVE* ANYTHING FROM THE
DB BECAUSE WE
> DON'T KNOW WHAT WE CAN REMOVE AS EXPLAINED ABOVE. IT
CAN *OVERWRITE*
> STUFF THOUGH, AND PROBLEMS ARE AROUND THESE
OVERWRITES.
>
> SO HOW SHOULD THE UPDATE PATHS WORK FOR DRUPAL AND FOR
MODULES/THEMES?
>

FORGIVE ME IF THIS IS A DUMB QUESTION THAT HAVE BEEN
DISCUSSED BEFORE..

HOW COME DRUPAL DOES NOT USE THE NATIVE GETTEXT 'MO' FORMAT
(BINARY PO) FOR 
STRINGS TRANSLATION?
WHY IS THE PROCESS OF COPYING STRINGS FROM THE 'PO' INTO THE
DATABASE IS 
NEEDED? IS IT MEANINGFUL IN TERMS OF PERFORMANCE?

TAKING THIS A BIT FURTHER, IF 'MO' FILES WERE USED *INSTEAD*
OF THE DATABASE, 
THIS PROBLEM COULD BE EASILY SOLVED BY JUST LETTING THE WEB
INTERFACE PUT 
STRINGS IN THE DATABASE WHICH WILL HAVE PRECEDENCE OVER THE
'MO' STRINGS.

AGAIN, SORRY IF THIS IS WAY OFF.. I WAS JUST WONDERING ABOUT
THIS ISSUE EVER 
SINCE I MET WITH DRUPAL..

--YUVAL

Re: updating translations: how valuable is user data after all?
user name
2007-05-23 12:57:27
יובל האגר wrote:
> Forgive me if this is a dumb question that have been
discussed before..
> 
> How come Drupal does not use the native gettext 'mo'
format (binary po) for 
> strings translation?
> Why is the process of copying strings from the 'po'
into the database is 
> needed? Is it meaningful in terms of performance?

1. We should not except the PHP version of Drupal users to
have the 
gettext extension loaded, to build on that. Gettext is not a
common 
extension installed with PHP as far as we heard/imagine (no
hard 
evidence though).

2. Anyway, actually noone implemented a gettext extension
based locale 
module, so we can benchmark the performance against the
current 
implementation. (It could be slower or quicker, we don't
know). But see 
the previous point.

3. Finally noone come around to implement a MO reader and
handler in PHP 
(not using the gettext extension) and proved it that it is
better then 
using the database. (Gerhard has some itch to scratch here
if I 
understand it right, so it might happen. No hard date on it
though, and 
probably as a contrib module first).

> Taking this a bit further, if 'mo' files were used
*instead* of the database, 
> this problem could be easily solved by just letting the
web interface put 
> strings in the database which will have precedence over
the 'mo' strings.

We would not really need the database then. If we have mo
reader code, 
mo writing is not far away. We could use a mo file for user
modified 
strings.

Summary: yes, this is a possibility, noone explored it yet,
and proved 
it is superior to what we do now.

Gabor

[1-10] [11]

about | contact  Other archives ( Real Estate discussion Medical topics )