|
List Info
Thread: updating translations: how valuable is user data after all?
|
|
| updating translations: how valuable is
user data after all? |

|
2007-05-22 04:53:19 |
Hi guys,
Now Drupal 6.x-dev includes cool features to import PO files
automatically at every logical step:
- you can install Drupal in your foreign language, and
have
PO files for all enabled modules imported along
automatically
- when you add a new language, all translation files for
enabled modules get imported automatically for that
language
- when you install new modules or enable themes, the
translation
files for these components get imported for all enabled
languages
This is all great and automated, contrib modules already
have their PO
files at the right place, and we will update the packaging
scripts for
Drupal 6 to package core translations properly.
You might notice a pattern in the above features though:
they IMPORT
stuff into the database. Unfortunately we have no way in
Drupal 6 to
remove translations when you disable a theme or uninstall a
module. We
don't know what strings appeared in *only* that component,
and not
elsewhere in Drupal, so we can remove them without problems.
For that,
we would need the extractor script built into Drupal core to
look
through all source files of enabled components and identify
the unused
strings in the database. Fortunately this is doable in
contrib, now that
extractor has it's own Drupal module. (Of course it is
doable in Drupal
core my deleting all strings from the database and
reimporting files for
only the enabled components, but read on about the value of
user data).
BTW Drupal 6 core still need upgrade support for
translations. So when
you update a module or Drupal itself, new and corrected
translations get
into your database. New translations are easy again, they
are just
importing new stuff, which we are very good at Updating
translations
already in the DB threatens user data though. In Drupal 5
and before, we
have no information about what translations a user modified
on the web
interface, so we don't know what was imported from available
PO files
and what was user defined. We can reimport stuff from the
files, but can
easily loose/overwrite user defined/updated strings.
What can we do about not to loose user defined strings? We
can easily
introduce a 'modified' bit into the locale translations
(target) table,
just as it was in menu module in Drupal 5. That would help
us from
Drupal 6 onward, but it does not help us loosing user
defined strings
when a Drupal 5 to Drupal 6 upgrade happens. So how cautious
should we
be there?
1. Do not overwrite any existing translation, risking
that we leave
incorrect and fixed translations in the database.
2. Do overwrite existing translations on an update,
risking that
we overwrite user modified translations.
Note that an update will not *remove* anything from the DB
because we
don't know what we can remove as explained above. It can
*overwrite*
stuff though, and problems are around these overwrites.
So how should the update paths work for Drupal and for
modules/themes?
Gabor
|
|
| Re: updating translations: how valuable
is user data after all? |

|
2007-05-22 14:34:58 |
|
What if the update script compared the PO imported strings to the ones in the database, if they are identical it would be marked as "not modified" in the newly added modified bit, else, it would be marked as modified. Would that work?
On 5/22/07, Gabor Hojtsy < gabor hojtsy.hu">gabor hojtsy.hu> wrote:
Hi guys,
Now Drupal 6.x-dev includes cool features to import PO files automatically at every logical step:
- you can install Drupal in your foreign language, and have PO files for all enabled modules imported along automatically
- when you add a new language, all translation files for enabled modules get imported automatically for that language
- when you install new modules or enable themes, the translation files for these components get imported for all enabled
languages
This is all great and automated, contrib modules already have their PO files at the right place, and we will update the packaging scripts for Drupal 6 to package core translations properly.
You might notice a pattern in the above features though: they IMPORT stuff into the database. Unfortunately we have no way in Drupal 6 to remove translations when you disable a theme or uninstall a module. We
don39;t know what strings appeared in *only* that component, and not elsewhere in Drupal, so we can remove them without problems. For that, we would need the extractor script built into Drupal core to look through all source files of enabled components and identify the unused
strings in the database. Fortunately this is doable in contrib, now that extractor has it's own Drupal module. (Of course it is doable in Drupal core my deleting all strings from the database and reimporting files for
only the enabled components, but read on about the value of user data).
BTW Drupal 6 core still need upgrade support for translations. So when you update a module or Drupal itself, new and corrected translations get
into your database. New translations are easy again, they are just importing new stuff, which we are very good at Updating translations already in the DB threatens user data though. In Drupal 5 and before, we
have no information about what translations a user modified on the web interface, so we don't know what was imported from available PO files and what was user defined. We can reimport stuff from the files, but can
easily loose/overwrite user defined/updated strings.
What can we do about not to loose user defined strings? We can easily introduce a 'modified' bit into the locale translations (target) table, just as it was in menu module in Drupal 5. That would help us from
Drupal 6 onward, but it does not help us loosing user defined strings when a Drupal 5 to Drupal 6 upgrade happens. So how cautious should we be there?
1. Do not overwrite any existing translation, risking that we leave
incorrect and fixed translations in the database.
2. Do overwrite existing translations on an update, risking that we overwrite user modified translations.
Note that an update will not *remove* anything from the DB because we
don39;t know what we can remove as explained above. It can *overwrite* stuff though, and problems are around these overwrites.
So how should the update paths work for Drupal and for modules/themes?
Gabor
|
| Re: updating translations: how valuable
is user data after all? |

|
2007-05-22 14:47:50 |
Ashraf Amayreh wrote:
> What if the update script compared the PO imported
strings to the ones
> in the database, if they are identical it would be
marked as "not
> modified" in the newly added modified bit, else,
it would be marked as
> modified. Would that work?
Well, the idea of the modified bit would be to mark what you
modified on
the web interface, so we can protect them from later
modifications. The
fact that an updated translation contains modified strings
does not mean
you ever touched the original ones. Keeping them in the
database would
conserve the bad translations the teams try to update. This
reuse of the
modified bit would be against the intended role of the
modified bit.
Of course we can reposition the suggested modified bit, but
then it will
not serve a protection role, just some kind of notification
that the
version in the system is different in the PO files. How
would that serve
the user? How would users clean up this stuff, to
distinguish between
translation team updated strings and real user modified
translations?
Gabor
|
|
| Re: updating translations: how valuable
is user data after all? |

|
2007-05-22 15:44:42 |
|
hmm... we would know of any web modified values if we could theoretically gather all the PO files that were used on a drupal 5 installation, right? I know little info about PO files so apologies if I'm making any wrong assumptions here. I've seen PO files distributed in modules which would make this work, but I don't know how applicable this is if an external PO file was used or so forth. Any ideas on applicability of this?
In worst case, doing this technique on existing PO files is still better as it will recognize some strings as "not modified" rather than either considering all strings as "modified" or "not modified". It's still better to use what PO files exist to get the least error margine I guess.
Also, what if the modified int (not bit) had a third "unknown" value, where the user could go and resolve these words? I'm really just babbling some random thoughts here.
On 5/22/07, Gabor Hojtsy < gabor hojtsy.hu">gabor hojtsy.hu> wrote:
Ashraf Amayreh wrote: > What if the update script compared the PO imported strings to the ones > in the database, if they are identical it would be marked as "not > modified" in the newly added modified bit, else, it would be marked as
> modified. Would that work?
Well, the idea of the modified bit would be to mark what you modified on the web interface, so we can protect them from later modifications. The fact that an updated translation contains modified strings does not mean
you ever touched the original ones. Keeping them in the database would conserve the bad translations the teams try to update. This reuse of the modified bit would be against the intended role of the modified bit.
Of course we can reposition the suggested modified bit, but then it will not serve a protection role, just some kind of notification that the version in the system is different in the PO files. How would that serve
the user? How would users clean up this stuff, to distinguish between translation team updated strings and real user modified translations?
Gabor
|
| Re: updating translations: how valuable
is user data after all? |

|
2007-05-22 15:53:25 |
Ashraf Amayreh wrote:
> hmm... we would know of any web modified values if we
could
> theoretically gather all the PO files that were used on
a drupal 5
> installation, right? I know little info about PO files
so apologies if
> I'm making any wrong assumptions here. I've seen PO
files distributed in
> modules which would make this work, but I don't know
how applicable this
> is if an external PO file was used or so forth. Any
ideas on
> applicability of this?
When you update from Drupal 5 to Drupal 6, you throw away
your Drupal 5
code and you get the new Drupal 6 code. We will try to
educate people to
grab the core language pack too before the upgrade. The
problem is that
we don't have the PO files used on the Drupal 5 site, if
any... Many
people translate modules they have no translation for on the
web
interface, because 'it is there'.
> In worst case, doing this technique on existing PO
files is still better
> as it will recognize some strings as "not
modified" rather than either
> considering all strings as "modified" or
"not modified". It's still
> better to use what PO files exist to get the least
error margine I guess.
If only we would have the PO files used before. With most
Drupal 5
sites, the PO files are not in the webroot, because it was
told to
people to import translations by uploading a big PO file. So
they have
the data in the DB but not likely that they have the exact
source PO file.
> Also, what if the modified int (not bit) had a third
"unknown" value,
> where the user could go and resolve these words? I'm
really just
> babbling some random thoughts here.
Hm, maybe we could do that for this interim period. Question
is what UI
can help users here. It should be simple and quick. (We can
have a few
languages with two dozen modules easily, which could result
in many
"unknown state" translations).
Gabor
|
|
| Re: updating translations: how valuable
is user data after all? |

|
2007-05-22 16:12:16 |
|
As to replacing the files with the drupal 6 files. There's nothing preventing us from creating a two step scenario here, one would alter the table and set the flags on the initial drupal 5 installation, thus using the existing PO files (if any!). The other update would continue the processing (if needed) when the new drupal 6 files are in place.
So then the process would go like this:
1. Every string MATCHING an existing string in a PO file would be set to "not modified" 2. Every string NOT MATCHING an existing string in a PO file would be set to "modified"
3. Every string that doesn't exist in a PO file would be set to "modified"
Actually, this really makes sense, the user who's upgrading will most likely be content with the translations as they appear on his drupal 5 site, so it would be most reasonable to set a non-matching/non-existant string as modified to avoid overwriting it in future PO file dumps.
By the way, how does drupal currently distinguish PO imported strings from web-modified strings? Or does it just overwrite everything on a PO dump?
On 5/22/07,
Gabor Hojtsy < gabor hojtsy.hu">gabor hojtsy.hu> wrote:Ashraf Amayreh wrote:
> hmm... we would know of any web modified values if we could > theoretically gather all the PO files that were used on a drupal 5 > installation, right? I know little info about PO files so apologies if
> I'm making any wrong assumptions here. I've seen PO files distributed in > modules which would make this work, but I don't know how applicable this > is if an external PO file was used or so forth. Any ideas on
> applicability of this?
When you update from Drupal 5 to Drupal 6, you throw away your Drupal 5 code and you get the new Drupal 6 code. We will try to educate people to grab the core language pack too before the upgrade. The problem is that
we don't have the PO files used on the Drupal 5 site, if any... Many people translate modules they have no translation for on the web interface, because 'it is there'.
> In worst case, doing this technique on existing PO files is still better
> as it will recognize some strings as "not modified" rather than either > considering all strings as "modified" or "not modified". It's still > better to use what PO files exist to get the least error margine I guess.
If only we would have the PO files used before. With most Drupal 5 sites, the PO files are not in the webroot, because it was told to people to import translations by uploading a big PO file. So they have
the data in the DB but not likely that they have the exact source PO file.
> Also, what if the modified int (not bit) had a third "unknown" value, > where the user could go and resolve these words? I'm really just
> babbling some random thoughts here.
Hm, maybe we could do that for this interim period. Question is what UI can help users here. It should be simple and quick. (We can have a few languages with two dozen modules easily, which could result in many
"unknown state" translations).
Gabor
|
| Re: updating translations: how valuable
is user data after all? |

|
2007-05-22 17:08:42 |
Ashraf Amayreh wrote:
> As to replacing the files with the drupal 6 files.
There's nothing
> preventing us from creating a two step scenario here,
one would alter
> the table and set the flags on the initial drupal 5
installation, thus
> using the existing PO files (if any!). The other update
would continue
> the processing (if needed) when the new drupal 6 files
are in place.
Well, how many of your Drupal 5 sites you have PO files
handy? Even if
you have, one of the reasons of introducing split (smaller)
PO files in
Drupal 6 is that we cannot read and parse a big PO file all
at once if
we are running under tight server resources (like a busy
server with a
default PHP time limit).
> So then the process would go like this:
>
> 1. Every string MATCHING an existing string in a PO
file would be set to
> "not modified"
> 2. Every string NOT MATCHING an existing string in a PO
file would be
> set to "modified"
> 3. Every string that doesn't exist in a PO file would
be set to "modified"
>
> Actually, this really makes sense, the user who's
upgrading will most
> likely be content with the translations as they appear
on his drupal 5
> site, so it would be most reasonable to set a
non-matching/non-existant
> string as modified to avoid overwriting it in future PO
file dumps.
>
> By the way, how does drupal currently distinguish PO
imported strings
> from web-modified strings? Or does it just overwrite
everything on a PO
> dump?
There is nothing to distinguish them, this is our upgrade
problem, and
will be our upgrade problem if we don't do something against
it.
Gabor
|
|
| Re: updating translations: how valuable
is user data after all? |

|
2007-05-22 17:50:00 |
|
The only two options I see in this case is either to give them a status of "unknown" or to somehow query the user for the behavior he wants during the update (either to preserve them, modified bit set, or to make them overridable, modified bit unset). How we can query a user during an update is a mystery to me.
I would go for creating a third status of unknown as this will open up other possibilities later on since they will at least be distinguishable from normal strings. It will also by some time if the code freeze is "frozen".
What we could do with them after that is open to suggestions, but I can think of a scenario where importing future PO files could check for their presence and then setting their status to "modified"/"not modified" according to weather they match the PO file strings or not. There's really no totally clean way to solve this. Any assumptions on behalf of the user could bear real bad consequences.
On 5/23/07, Gabor Hojtsy < gabor hojtsy.hu">gabor hojtsy.hu> wrote:
Ashraf Amayreh wrote: > As to replacing the files with the drupal 6 files. There's nothing > preventing us from creating a two step scenario here, one would alter > the table and set the flags on the initial drupal 5 installation, thus
> using the existing PO files (if any!). The other update would continue > the processing (if needed) when the new drupal 6 files are in place.
Well, how many of your Drupal 5 sites you have PO files handy? Even if
you have, one of the reasons of introducing split (smaller) PO files in Drupal 6 is that we cannot read and parse a big PO file all at once if we are running under tight server resources (like a busy server with a
default PHP time limit).
> So then the process would go like this: > > 1. Every string MATCHING an existing string in a PO file would be set to > "not modified" > 2. Every string NOT MATCHING an existing string in a PO file would be
> set to "modified" > 3. Every string that doesn't exist in a PO file would be set to "modified" > > Actually, this really makes sense, the user who's upgrading will most
> likely be content with the translations as they appear on his drupal 5 > site, so it would be most reasonable to set a non-matching/non-existant > string as modified to avoid overwriting it in future PO file dumps.
> > By the way, how does drupal currently distinguish PO imported strings > from web-modified strings? Or does it just overwrite everything on a PO > dump?
There is nothing to distinguish them, this is our upgrade problem, and
will be our upgrade problem if we don't do something against it.
Gabor
|
| Re: updating translations: how valuable
is user data after all? |
  Israel |
2007-05-23 04:44:15 |
???? ????? 22 ??? 2007, 12:53, ???? ?? ??? GABOR HOJTSY:
> HI GUYS,
>
> NOW DRUPAL 6.X-DEV INCLUDES COOL FEATURES TO IMPORT PO
FILES
> AUTOMATICALLY AT EVERY LOGICAL STEP:
>
> [SNIP...]
>
> NOTE THAT AN UPDATE WILL NOT *REMOVE* ANYTHING FROM THE
DB BECAUSE WE
> DON'T KNOW WHAT WE CAN REMOVE AS EXPLAINED ABOVE. IT
CAN *OVERWRITE*
> STUFF THOUGH, AND PROBLEMS ARE AROUND THESE
OVERWRITES.
>
> SO HOW SHOULD THE UPDATE PATHS WORK FOR DRUPAL AND FOR
MODULES/THEMES?
>
FORGIVE ME IF THIS IS A DUMB QUESTION THAT HAVE BEEN
DISCUSSED BEFORE..
HOW COME DRUPAL DOES NOT USE THE NATIVE GETTEXT 'MO' FORMAT
(BINARY PO) FOR
STRINGS TRANSLATION?
WHY IS THE PROCESS OF COPYING STRINGS FROM THE 'PO' INTO THE
DATABASE IS
NEEDED? IS IT MEANINGFUL IN TERMS OF PERFORMANCE?
TAKING THIS A BIT FURTHER, IF 'MO' FILES WERE USED *INSTEAD*
OF THE DATABASE,
THIS PROBLEM COULD BE EASILY SOLVED BY JUST LETTING THE WEB
INTERFACE PUT
STRINGS IN THE DATABASE WHICH WILL HAVE PRECEDENCE OVER THE
'MO' STRINGS.
AGAIN, SORRY IF THIS IS WAY OFF.. I WAS JUST WONDERING ABOUT
THIS ISSUE EVER
SINCE I MET WITH DRUPAL..
--YUVAL
|
|
| Re: updating translations: how valuable
is user data after all? |

|
2007-05-23 12:57:27 |
יובל האגר wrote:
> Forgive me if this is a dumb question that have been
discussed before..
>
> How come Drupal does not use the native gettext 'mo'
format (binary po) for
> strings translation?
> Why is the process of copying strings from the 'po'
into the database is
> needed? Is it meaningful in terms of performance?
1. We should not except the PHP version of Drupal users to
have the
gettext extension loaded, to build on that. Gettext is not a
common
extension installed with PHP as far as we heard/imagine (no
hard
evidence though).
2. Anyway, actually noone implemented a gettext extension
based locale
module, so we can benchmark the performance against the
current
implementation. (It could be slower or quicker, we don't
know). But see
the previous point.
3. Finally noone come around to implement a MO reader and
handler in PHP
(not using the gettext extension) and proved it that it is
better then
using the database. (Gerhard has some itch to scratch here
if I
understand it right, so it might happen. No hard date on it
though, and
probably as a contrib module first).
> Taking this a bit further, if 'mo' files were used
*instead* of the database,
> this problem could be easily solved by just letting the
web interface put
> strings in the database which will have precedence over
the 'mo' strings.
We would not really need the database then. If we have mo
reader code,
mo writing is not far away. We could use a mo file for user
modified
strings.
Summary: yes, this is a possibility, noone explored it yet,
and proved
it is superior to what we do now.
Gabor
|
|
|
|