|
List Info
Thread: comments on Unicode document
|
|
| comments on Unicode document |

|
2006-12-12 02:23:04 |
James,
I have reviewed your Unicode document. Below are my
responses:
==
http://pylonshq.com/proje
ct/pylonshq/browser/Pylons/trunk/docs/internationalization.t
xt
==
This is a lot of non-Pylons-specific work!
==
message you will have run into a problem
Get rid of "will".
==
The good news is that Python has great Unicode support
though so
the rest of
Try:
The good news is that Python has great Unicode support, so
the
rest of
==
Unicode can represent every possible character
pretty much any character in any writing system in
widespread use today
==
For real world use it is recommended that you use the UTF-8
encoding for your file but you must be sure that your text
editor actually saves the file as UTF-8 otherwise the
Python
interpreter will try to parse UTF-8 characters but they
will
actually be stored as something else.
This is a run on sentence.
==
If you are working with Unicode in detail you might also be
interested in the ``unicodedata`` module can be used
s/module/module which/g
==
also different using Unicode
Try:
also be different using Unicode
==
XML parsers and SQL database
Try:
XML parsers and SQL databases
==
I think the document is inconsistent with regard to English
vs. American
spelling.
==
Here is an example of how to reading Unicode
Try:
Here is an example of how to read Unicode
==
preform input and output
Try:
perform input and output
==
should also submit UTF-8 to back to your
Try:
should also submit UTF-8 back to your
==
In reality browsers don't always return data in the same
encoding you set
If you specify that your output is UTF-8, generally the Web
browser will
give you UTF-8. If you want something else, you can use the
following
on each form tag:
<form accept-encoding="US-ASCII" ...>
However, be forewarned that if the user tries to give you
non-ASCII
text, then:
* Firefox will translate the non-ASCII text into HTML
entities.
* IE will ignore your suggested encoding and give you UTF-8
anyway.
The lesson to be learned is that if you output UTF-8, you
had better be
prepared to accept UTF-8.
==
use an algorithm to analyse the input and guess the
encoding
based on probabilities.
For instance, if you get a file, and you don't know what
encoding it is
encoded in, you can often rename the file with a .txt
extension and then
try to open it in Firefox. Then you can use the "View
> Character
Encoding" menu to try to auto-detect the encoding.
==
For example MySQL's Unicode documentation is here
Also note that you need to consider both the encoding of the
database
and the encoding used by the database driver.
If you're using MySQL together with SQLAlchemy, see the
following, as
there are some bugs in MySQLdb that you'll need to work
around:
http://www.mai
l-archive.com/sqlalchemy googlegroups.com/msg00366.html
==
Applying this to Web Programming
There's a ton of very good general information above this
section, but
not enough specific information in this section. This is
partly because
it's not really done yet (e.g. bug #135), and partly because
there are
so many variations of:
* Database
* Database driver
* Templating engine
==
There is also a ``h._()`` method which does the same as
``h.gettext()``.
Since you're encouraging the user to always use unicode
objects within
the application, he should use ugettext instead of gettext.
That means
it would be a lot more convenient if h._ pointed to ugettext
instead of
gettext.
==
h.gettext('Hello')
By default, your application should make use of ugettext,
and per my
earlier statement, you should call it _.
==
to handle language internationalization
Delete the word "language".
==
Depending on the tool you use for translations, you will
need to
be familiar with some or all of these files.
Delete this sentence.
==
You will therefor need to use an external program to
perform
these tasks.
I highly recommend using xgettext. Python's gettext utility
has some
bugs, especially regarding plurals.
==
There are various tools available to aid in translating.
You may
use whichever you prefer.
This should be a new paragraph. It has nothing to do with
the preceding
sentence.
==
for editing PO files and generate MO files.
s/generate/generating/g
==
for the KDE window manager on Linux.
for KDE. (It's not just a window manager, and it runs on
more than just
Linux.)
==
You can then enter your translations in whatever charset
you
chose at in the project info tab.
Delete "at".
==
Your ``i18n`` directory should look like this when you have
finished::
i18n/translate_demo.pot
i18n/en/translate_demo.po
i18n/en/LC_MESSAGES/translate_demo.mo
i18n/es/translate_demo.po
i18n/es/LC_MESSAGES/translate_demo.mo
i18n/fr/translate_demo.po
i18n/fr/LC_MESSAGES/translate_demo.mo
I put the .po files right next to the .mo files. That's how
it works in
other applications.
==
Then on each controller call the language to be used could
be
read from the session
Syntax error
==
You can now set the language used in a controller on the
fly.
Aquarium has some code that I gave to Ben for pulling the
preferred
language settings from the browser headers. You should use
and talk
about this.
==
If your code calls ``h.ugettext()`` with a string that
doesn't
exist in your language catalogue, the string passed to
``h.ugettext()`` is returned instead.
Actually, gettext's understanding of fallbacks is much more
complicated
than this. Furthermore, the Aquarium code mentioned above
uses browser
settings so that gettext can fallback to another language
that the user
knows.
==
resp.write("%s: %s %s<br />" %
(h.get_lang(), h.ugettext('Hello'),
h.ugettext('World!')))
Breaking up sentences in this way is very dangerous because
some
grammars might require the order of the words to be
different. I know
this is just an example, but still.
==
h.ugettext('Hello')
By the way, this reminds me that xgettext has a hard time
understanding
the call to ugettext with the "h." prefix. I
generally state that in
both Python and Cheetah, you should use *just _*, without
the "h." Of
course, you must take efforts to make this work.
==
This means the best solution to ensure all strings are
picked up
for translation is to create a file in ``lib`` with an
appropriate filename, ``i18n.py`` for example, and then add
a
list of all the strings which appear in your templates so
that
the ``lang_extract`` command can then extract the strings
in
``lib/i18n.py`` for translation and use the translated
versions
in your templates as well.
I am strongly opposed to this because it requires too much
work and is
too fragile. Myghty, for instance, can be compiled down to
Python. The
same is true of Cheetah.
==
Of course, if you are using a templating system such as
Myghty
or Cheetah and your cache directory is in the default
location
or elsewhere within your project's filesystem, you will
probably
find that all templates have been cached as Python files
during
the course of the development process and so the
``lang_extract`` command will successfully pick up strings
to
translate from the cached files anyway.
In past projects, I have used a Makefile to ensure that
every template
got compiled to Python to make sure that every template was
scanned.
By the way, xgettext won't completely barf if you feed it
the naked
template because it's knowledge of Python is very small.
However, this
is admittedly a hack.
==
If you wish to use plural forms in your application...
One thing to keep in mind is that other languages don't have
the same
plural forms as English. While English only has 2 (singular
and
plural), Slovenian has 4! That means that you must use
gettext's
support for pluralization if you hope to get pluralization
right.
Specifically, the following will not work:
# BAD!
if n == 1:
msg = h._("There was no dog.")
else:
msg = h._("There were no dogs.")
==
You should also remind the user not to piece sentences
together because
certain languages might need to invert the grammars:
# BAD!
msg = h._("He told her ")
msg += h._("not to go outside.")
==
James, thanks for all your hard work!!!
Best Regards,
-jj
--
http://jjinux.blogspot.co
m/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-12 15:03:41 |
Hi jj,
> I have reviewed your Unicode document. Below are my
responses:
Blimey, that was quick! I only checked in the first draft an
hour or two
ago! Very much appreciated!
> This is a lot of non-Pylons-specific work!
It is, but I just think it is useful to have a definitive
guide all in
one place hopefully!
> Get rid of "will".
Rephrased it.
> The good news is that Python has great Unicode
support, so the
> rest of
Changed.
> pretty much any character in any writing system in
widespread use today
Corrected.
> For real world use it is recommended that you use the
UTF-8
> encoding for your file but you must be sure that your
text
> editor actually saves the file as UTF-8 otherwise the
Python
> interpreter will try to parse UTF-8 characters but
they will
> actually be stored as something else.
>
> This is a run on sentence.
Sorry, what do you mean by that?
> s/module/module which/g
> also be different using Unicode
> XML parsers and SQL databases
All done.
> ==
>
> I think the document is inconsistent with regard to
English vs. American
> spelling.
>
> ==
Well my rule is this: Use English everywhere except in
commonly used
computer terms when Americans get cross and send me emails
telling me
I've misspelled words! It is a bit inconsistent but it means
I'm using
English but with American spellings of Internationalization
and
Localization.
If anything really leaps out I'm happy to change it though.
> Here is an example of how to read Unicode
> perform input and output
>
> ==
>
> should also submit UTF-8 to back to your
>
> Try:
>
> should also submit UTF-8 back to your
>
> ==
>
> In reality browsers don't always return data in the
same
> encoding you set
>
> If you specify that your output is UTF-8, generally the
Web browser will
> give you UTF-8. If you want something else, you can
use the following
> on each form tag:
>
> <form accept-encoding="US-ASCII" ...>
>
> However, be forewarned that if the user tries to give
you non-ASCII
> text, then:
>
> * Firefox will translate the non-ASCII text into HTML
entities.
>
> * IE will ignore your suggested encoding and give you
UTF-8 anyway.
>
> The lesson to be learned is that if you output UTF-8,
you had better be
> prepared to accept UTF-8.
>
> use an algorithm to analyse the input and guess the
encoding
> based on probabilities.
>
> For instance, if you get a file, and you don't know
what encoding it is
> encoded in, you can often rename the file with a .txt
extension and then
> try to open it in Firefox. Then you can use the
"View > Character
> Encoding" menu to try to auto-detect the encoding.
>
I've re-written this section incorporating your suggestions.
>
> For example MySQL's Unicode documentation is here
>
> Also note that you need to consider both the encoding
of the database
> and the encoding used by the database driver.
>
> If you're using MySQL together with SQLAlchemy, see the
following, as
> there are some bugs in MySQLdb that you'll need to work
around:
>
> http://www.mai
l-archive.com/sqlalchemy googlegroups.com/msg00366.html
Added.
> There's a ton of very good general information above
this section, but
> not enough specific information in this section. This
is partly because
> it's not really done yet (e.g. bug #135), and partly
because there are
> so many variations of:
>
> * Database
> * Database driver
> * Templating engine
I agree completely. I just thought it was best to set out
what is
definitely correct at the moment rather than things that may
or may not
work. We can use this as a starting point to highlight other
areas which
Pylons doesn't handle adequately and add the documentation
as it becomes
possible.
> Since you're encouraging the user to always use unicode
objects within
> the application, he should use ugettext instead of
gettext. That means
> it would be a lot more convenient if h._ pointed to
ugettext instead of
> gettext.
We can't change that becuase it might break existing code
that isn't
expecting unicode data. On my machine though I found
gettext() sometimes
returns unicode anyway. If this is the case on other
people's setups too
I'd be happy to make the change.
> By default, your application should make use of
ugettext,
The docs already say this.
> and per my
> earlier statement, you should call it _.
But it might not be backwards compatible. If you tell me it
is, I'll
change it?
> Delete the word "language".
> Delete this sentence.
> I highly recommend using xgettext. Python's gettext
utility has some
> bugs, especially regarding plurals.
>
> There are various tools available to aid in
translating. You may
> use whichever you prefer.
>
> This should be a new paragraph. It has nothing to do
with the preceding
> sentence.
OK.
> ==
>
> for editing PO files and generate MO files.
>
> s/generate/generating/g
Yup.
>
> for the KDE window manager on Linux.
>
> for KDE. (It's not just a window manager, and it runs
on more than just
> Linux.)
OK.
>
> ==
>
> You can then enter your translations in whatever
charset you
> chose at in the project info tab.
>
> Delete "at".
Yup!
> I put the .po files right next to the .mo files.
That's how it works in
> other applications.
Sensible.
> ==
>
> Then on each controller call the language to be used
could be
> read from the session
>
> Syntax error
Sorry, do you mean lowercase t on Then?
> Aquarium has some code that I gave to Ben for pulling
the preferred
> language settings from the browser headers. You should
use and talk
> about this.
Ooh, great. Any chance you can send it to me too? I didn't
know about that!
>
> If your code calls ``h.ugettext()`` with a string that
doesn't
> exist in your language catalogue, the string passed to
> ``h.ugettext()`` is returned instead.
>
> Actually, gettext's understanding of fallbacks is much
more complicated
> than this. Furthermore, the Aquarium code mentioned
above uses browser
> settings so that gettext can fallback to another
language that the user
> knows.
OK, if you could send the code I'll have a look.
>
> resp.write("%s: %s %s<br />" %
(h.get_lang(), h.ugettext('Hello'),
> h.ugettext('World!')))
>
> Breaking up sentences in this way is very dangerous
because some
> grammars might require the order of the words to be
different. I know
> this is just an example, but still.
Fair point.
> By the way, this reminds me that xgettext has a hard
time understanding
> the call to ugettext with the "h." prefix. I
generally state that in
> both Python and Cheetah, you should use *just _*,
without the "h." Of
> course, you must take efforts to make this work.
Yes, I wasn't sure about this either. If the general
consensus is to use
ugettext() rather than h.ugettext() I'm happy to drop them
from the
helpers.
>
> ==
>
> This means the best solution to ensure all strings are
picked up
> for translation is to create a file in ``lib`` with an
> appropriate filename, ``i18n.py`` for example, and
then add a
> list of all the strings which appear in your templates
so that
> the ``lang_extract`` command can then extract the
strings in
> ``lib/i18n.py`` for translation and use the translated
versions
> in your templates as well.
>
> I am strongly opposed to this because it requires too
much work and is
> too fragile. Myghty, for instance, can be compiled
down to Python. The
> same is true of Cheetah.
>
> In past projects, I have used a Makefile to ensure that
every template
> got compiled to Python to make sure that every template
was scanned.
>
> By the way, xgettext won't completely barf if you feed
it the naked
> template because it's knowledge of Python is very
small. However, this
> is admittedly a hack.
OK, I've reworded it.
> ==
>
> If you wish to use plural forms in your application...
>
> One thing to keep in mind is that other languages don't
have the same
> plural forms as English. While English only has 2
(singular and
> plural), Slovenian has 4! That means that you must use
gettext's
> support for pluralization if you hope to get
pluralization right.
> Specifically, the following will not work:
>
> # BAD!
> if n == 1:
> msg = h._("There was no dog.")
> else:
> msg = h._("There were no dogs.")
>
> ==
>
> You should also remind the user not to piece sentences
together because
> certain languages might need to invert the grammars:
>
> # BAD!
> msg = h._("He told her ")
> msg += h._("not to go outside.")
>
> ==
Yup, these are now incorporated too.
> James, thanks for all your hard work!!!
You are most welcome. I just hope other people find the
article useful.
Thanks for all your feedback too.
Here are the new changes for those that are interested:
h
ttp://pylonshq.com/project/pylonshq/changeset/1585
Outstanding issues are therefore:
* Whether to drop h.
* Whether to point _ to ugettext rather than gettext
* How to integrate the aquarium code
Cheers,
James
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-12 17:22:57 |
On 12/12/06, James Gardner <james pythonweb.org> wrote:
> > For real world use it is recommended that
you use the UTF-8
> > encoding for your file but you must be sure
that your text
> > editor actually saves the file as UTF-8
otherwise the Python
> > interpreter will try to parse UTF-8
characters but they will
> > actually be stored as something else.
> >
> > This is a run on sentence.
>
> Sorry, what do you mean by that?
It means it takes too many twists and turns. "For real
world ... but
you must ... otherwise the Python ... but they will
..." If you begin
a new sentence at "otherwise" it would be easier
to read.
In other words, don't write like Hemingway.
"It is recommended that you use" is a bit wordy
and passive. How
about "For real-world use I'd recommend the UTF-8
encoding...".
> > I think the document is inconsistent with regard
to English vs. American
> > spelling.
> Well my rule is this: Use English everywhere except in
commonly used
> computer terms when Americans get cross and send me
emails telling me
> I've misspelled words! It is a bit inconsistent but it
means I'm using
> English but with American spellings of
Internationalization and
> Localization.
We just had this issue with Linux Gazette
(linuxgazette.net), a zine
I've volunteered on for several years. Our rule is
"use either
American or British spelling but be consistent".
Unfortunately many
yanks don't know about anything east of the Statue of
Liberty or west
of the Golden Gate Bridge. Some words have gotten a
universal
American spelling for the computer term but not otherwise so
I've
heard: computer program vs theatre programme. (Of course I
would
spell it "theater program".) I haven't followed
i18n enough to know
whether "internationalization" and
"localization" are that way. I
would just spell them as I naturally would, and send
complainers a
link like this:
http://www.phrases.org.uk/bulletin_board/13/message
s/785.html
--
Mike Orr <sluggoster gmail.com>
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-12 19:22:27 |
On Dec 12, 2006, at 7:03 AM, James Gardner wrote:
>
>> Since you're encouraging the user to always use
unicode objects
>> within
>> the application, he should use ugettext instead of
gettext. That
>> means
>> it would be a lot more convenient if h._ pointed to
ugettext
>> instead of
>> gettext.
>
> We can't change that becuase it might break existing
code that isn't
> expecting unicode data. On my machine though I found
gettext()
> sometimes
> returns unicode anyway. If this is the case on other
people's
> setups too
> I'd be happy to make the change.
We've actually just recently changed trunk to use ugettext
for _ by
default. We didn't want to change this in between 0.9.x
versions, but
we thought it was warranted as:
o Many i18n setups are likely to want unicode strings (See
http://
pylonshq.com/project/pylonshq/ticket/126 )
o Our default templating engine now handles unicode
o It's not a heavily used part of the API (not many apps are
internationalized)
Hopefully those affected will notice the large WARNING in
the
changelog about it along with instructions on how to revert
back to
the old behavior.
>
>
>> By the way, this reminds me that xgettext has a
hard time
>> understanding
>> the call to ugettext with the "h."
prefix. I generally state that in
>> both Python and Cheetah, you should use *just _*,
without the
>> "h." Of
>> course, you must take efforts to make this work.
>
> Yes, I wasn't sure about this either. If the general
consensus is
> to use
> ugettext() rather than h.ugettext() I'm happy to drop
them from the
> helpers.
>
I ran a simple test on an old version (0.13.1) of xgettext,
and it
handles the 'h.' prefix. Is this test sufficient enough?
import gettext
_('underscore')
gettext('gettext')
ugettext('ugettext')
ngettext('ngettext', 'ngettexts', 3)
ungettext('ungettext', 'ungettexts',
4)
h._('h underscore')
h.gettext('h gettext')
h.ugettext('h ugettext')
h.ngettext('h ngettext', 'h ngettexts', 3)
h.ungettext('h ungettext', 'h ungettexts',
4)
http://paste.lisp
.org/display/32174
>
> * Whether to drop h.
> * Whether to point _ to ugettext rather than gettext
> * How to integrate the aquarium code
Can Ben or Shannon log a ticket about this one?
--
Philip Jenvey
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-12 21:41:49 |
On 12/12/06, Mike Orr <sluggoster gmail.com> wrote:
> On 12/12/06, James Gardner <james pythonweb.org> wrote:
> > > For real world use it is recommended
that you use the UTF-8
> > > encoding for your file but you must be
sure that your text
> > > editor actually saves the file as UTF-8
otherwise the Python
> > > interpreter will try to parse UTF-8
characters but they will
> > > actually be stored as something else.
> > >
> > > This is a run on sentence.
> >
> > Sorry, what do you mean by that?
I'm confused. I sent the original response, but it looks
likes James
sent a response including the sentence "Sorry, what do
you mean by
that?". I don't have that email. Can someone forward
it to me?
Thanks,
-jj
--
http://jjinux.blogspot.co
m/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-12 21:48:10 |
> > * How to integrate the aquarium code
>
> Can Ben or Shannon log a ticket about this one?
Done. I've also cut and pasted the appropriate code:
http://pylonshq.com/project/pylonshq/ticket/150#preview
a>
I hope that's helpful.
Best Regards,
-jj
--
http://jjinux.blogspot.co
m/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-12 22:01:35 |
Ok, I found James's responses in the archive:
> > Then on each controller call the language to be
used could be
> > read from the session
>
> > Syntax error
>
> Sorry, do you mean lowercase t on Then?
I think the sentence is grammatically incorrect and confused
;)
> Outstanding issues are therefore:
>
> * Whether to drop h.
It sounds like we can keep it and it works. If so, then
that's fine by me.
> * Whether to point _ to ugettext rather than gettext
It sounds like we're now pointing to ugettext, which is very
good.
> * How to integrate the aquarium code
I've created bug #150.
Happy Hacking!
-jj
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-14 18:30:59 |
Hello,
I am new to Pylons and currently do i18n-ed applications in
TurboGears.
One thing I'd like to see coming from TG background is
(semi-)automatic
handling of unicode decoding/encoding. E.g. in Turbogears I
can setup
framework to decode request parameters from utf8 and encode
it back to
utf8 as well as set proper Content-Type header.
May be it doesn't make sense to make it as coupled in Pylons
but
nevertheless, invoking decode_params and setting
content-type in each
and every request is kind of boring.
Simple decorator can work probably.
Regards,
Max.
P.S.: As for document in general it is quite impressive,
although first
third or half of it has nothing to do with Pylons. ;)
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-19 02:06:18 |
> http://pylonshq.com/proje
ct/pylonshq/browser/Pylons/trunk/docs/internationalization.t
xt
Wow, great document....
> This is a lot of non-Pylons-specific work!
And this should really be added to the Python wiki, too...
and
eventually should enter the standard Python documentation.
My notes:
http://pylonshq.com/p
roject/pylonshq/browser/Pylons/trunk/docs/internationalizati
on.txt#L88
stanardised => standardised
http://pylonshq.com/
project/pylonshq/browser/Pylons/trunk/docs/internationalizat
ion.txt#L107
> This has the useful side effect that English text looks
> exactly the same in UTF-8 as it did in ISO-8859-1 and
ASCII, because for every
> ISO-8859-1 character with hexadecimal value 0xXY, the
corresponding Unicode
> code point is U+00XY.
This is not true for iso-8859-1 since it defines characters
in the
160-255 range too. Only ASCI encoding (0-127) is a proper
subset of the
UTF-8 encoding.
http://www
.htmlhelp.com/reference/charset/ (ISO 8859-1 character
set
overview)
This is a very good takeaway rule:
> The main rule is this::
> Your application should use Unicode for all strings
internally, decoding
> any input to Unicode as soon as it enters the
application and encoding the
> Unicode to UTF-8 or another encoding on output.
I'd just add 'and encoding the Unicode to UTF-8 or another
encoding
ONLY on output.'
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
| comments on Unicode document |

|
2006-12-19 02:30:13 |
> For example MySQL's Unicode documentation is here
>
> Also note that you need to consider both the encoding
of the database
> and the encoding used by the database driver.
>
> If you're using MySQL together with SQLAlchemy, see the
following, as
> there are some bugs in MySQLdb that you'll need to work
around:
>
> http://www.mai
l-archive.com/sqlalchemy googlegroups.com/msg00366.html
Shannon, I've read the therad on the sqlalchemy list.. but I
never saw
the output of 'show create table ...'.
This is very important because different versions of MySQL
might have
different defaults, and your Linux distro might also set
it's own
defaults.
The thing is that a conversion can happen in the MySQL C API
and in the
MySQL DB too.
I've been using this code to connect to MySQL
db = MySQLdb.connect(db='test', charset='utf8',
use_unicode=True)
which, as I understand it, tells the MySQL C API to use
UTF-8 over the
wire (charset='utf8'), and to use unicode objects on the
Python side of
the things (use_unicode=True). But also my unicode test
table is
created like this:
CREATE TABLE `test_unicode_table` (
`id` int(11) NOT NULL auto_increment,
`test_column` varchar(5) collate utf8_unicode_ci default
NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=utf8
COLLATE=utf8_unicode_ci
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the
Google Groups "pylons-discuss" group.
To post to this group, send email to pylons-discuss googlegroups.com
To unsubscribe from this group, send email to
pylons-discuss-unsubscribe googlegroups.com
For more options, visit this group at h
ttp://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---
|
|
|
|