|
List Info
Thread: Htmltext and latin-1 characters
|
|
| Htmltext and latin-1 characters |

|
2006-05-10 21:15:29 |
Still trying to untangle my Quixote site that's having
problems with
foreign characters. It's a scientific environment so
people paste
text with the degree symbol from Word documents, and the
curly quotes
come along too, sigh. Worse, things come from unknown
character sets
because Word on Windows is different from Word on Mac; other
stuff
comes from FileMaker which uses different characters, etc.
We've
decided a wrong character is acceptable but exceptions are
not. Then
I also had a problem with MySQL truncating input at the
first
non-ASCII character, but I've got that fixed.
Now the problem is htmltext + Cheetah + str(). I made a
Cheetah
filter that smartly escapes non-htmltext values, and it's
used
throughout my application, some thirty templates.
from Cheetah.Filters import Filter
from quixote.html import htmlescape, htmltext
class HtmltextFilter(Filter):
"""Safer than WebSafe: escapes
values that aren't htmltext instances."""
def filter(self, val, **kw):
val = htmlescape(val)
if isinstance(val, htmltext):
return str(val) # Cheetah > 1.0rc1
compatibility.
else:
return val
In this case it's trying to filter U"A\xa0B"
retrieved from the
database. That's "AB" with the degree symbol
in between. Voila:
UnicodeEncodeError: 'ascii' codec can't encode character
u'\xa0' in
position 1: ordinal not in range(128)
OK, let's try returning Unicode instead.
return unicode(val, 'latin1') # Cheetah > 1.0rc1
compatibility.
TypeError: coercing to Unicode: need string or buffer,
htmltext found
Darn it, why didn't htmltext subclass str!!! Peeking into
the
htmltext implementation, it stores the actual value in an
attribute
..s:
return unicode(val.s, 'latin1') # Cheetah > 1.0rc1
compatibility.
TypeError: decoding Unicode is not supported
How about this?
return unicode(val.s) # Cheetah > 1.0rc1
compatibility.
UnicodeEncodeError: 'ascii' codec can't encode character
u'\xa0' in
position 412: ordinal not in range(128)
F**k! OK, the trick I used in TurboGears:
return unicode(val.s, 'latin1').encode('latin1') #
Cheetah >
1.0rc1 compatibility.
TypeError: decoding Unicode is not supported
So how *do* you convert an htmltext object containing a
non-ASCII
character to either str or unicode? And how do you output
it?
>>> print htmltext(U"A\xa0B")
UnicodeEncodeError: 'ascii' codec can't encode character
u'\xa0' in
position 1: ordinal not in range(128)
>> print htmltext("A\xa0B")
UnicodeEncodeError: 'ascii' codec can't encode character
u'\xa0' in
position 1: ordinal not in range(128)
>>> print htmltext("A\xa0B")
I tried sys.setdefaultencoding("latin1") but
that has to be done in
the 'site' module; it's not available within a program.
Another problem is, all my controller methods instantiate a
template
and "return str(t)" it. I'll have to change
that to "return
unicode(t)" or "return t.respond()" or
something; I'm not sure what.
Plus who knows how many htmltext objects are used as
placeholder
values; e.g., Quixote forms. So it looks like I'll have to
make
changes all over my program.
If you've been wondering why I've been making such a big
deal the past
few months about smart escaping in Cheetah and making
Cheetah deal
with Unicode, and whether/how the WebSafe filter needs to be
made
Unicode-friendly, this is why. It came to a head this past
couple
weeks as people started posting reports with the degree
symbol, and a
set of notifications that come in as email started using
"MASCULINE
ORDINAL INDICATOR" ("\xba") instead of
the proper "DEGREE SIGN"
("\xb0") because they both look like a circle
on some Windows screens,
and that made another program choke because I was converting
one to
ASCII ("degrees") and didn't know about the
other. Sigh.
--
Mike Orr <sluggoster gmail.com>
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-10 21:23:44 |
You might find the WebHelpers html_quote function useful:
http://pylonshq.com/project/pylo
nshq/browser/WebHelpers/trunk/webhelpers/util.py
Specifically asciification:
if not isinstance(s, basestring):
if hasattr(s, '__unicode__'):
s = unicode(s)
else:
s = str(s)
s = cgi.escape(s, True)
if isinstance(s, unicode):
s = s.encode('ascii', 'xmlcharrefreplace')
--
Ian Bicking / ianb colorstudy.com / http://blog.ianbicking.org
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-11 01:46:47 |
On May 10, 2006, at 5:15 PM, Mike Orr wrote:
> Another problem is, all my controller methods
instantiate a template
> and "return str(t)" it. I'll have to
change that to "return
> unicode(t)" or "return t.respond()"
or something; I'm not sure
> what. Plus who knows how many htmltext objects are used
as placeholder
> values; e.g., Quixote forms. So it looks like I'll
have to make
> changes all over my program.
FYI,
In QPY, the str of the h8 instance is always the utf8
encoding,
so calling str on them will not fail.
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-11 01:41:05 |
On May 10, 2006, at 5:15 PM, Mike Orr wrote:
> Darn it, why didn't htmltext subclass str!!! Peeking
into the
> htmltext implementation, it stores the actual value in
an attribute
> ...s:
Have you looked at QPY?
It is essentially PTL, except that the htmltext class,
called "h8" in QPY, *is* a subclass of unicode.
That's the main reason QPY exists.
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-11 01:54:30 |
On May 10, 2006, at 5:15 PM, Mike Orr wrote:
> return unicode(val.s, 'latin1') # Cheetah >
1.0rc1 compatibility.
>
> TypeError: decoding Unicode is not supported
This happens to be a case where the htmltext object is
wrapping a unicode instance. In a case like that, you
just want val.s if you want the unicode instance.
>
> How about this?
>
> return unicode(val.s) # Cheetah > 1.0rc1
compatibility.
>
> UnicodeEncodeError: 'ascii' codec can't encode
character u'\xa0' in
> position 412: ordinal not in range(128)
Is something calling str() on this return value?
Something not shown here is trying to encode
a str from the unicode instance.
>
> F**k! OK, the trick I used in TurboGears:
>
> return unicode(val.s, 'latin1').encode('latin1')
# Cheetah >
> 1.0rc1 compatibility.
>
> TypeError: decoding Unicode is not supported
This looks like the val.s is already a unicode.
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-11 02:02:07 |
On May 10, 2006, at 5:23 PM, Ian Bicking wrote:
> if not isinstance(s, basestring):
> if hasattr(s, '__unicode__'):
> s = unicode(s)
> else:
> s = str(s)
This logic is in quixote.html.stringify().
(and in qpy.stringify()).
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-11 03:59:03 |
On 5/10/06, David Binger <dbinger mems-exchange.org>
wrote:
>
> On May 10, 2006, at 5:15 PM, Mike Orr wrote:
>
> > Darn it, why didn't htmltext subclass str!!!
Peeking into the
> > htmltext implementation, it stores the actual
value in an attribute
> > ...s:
>
> Have you looked at QPY?
> It is essentially PTL, except that the htmltext class,
> called "h8" in QPY, *is* a subclass of
unicode.
> That's the main reason QPY exists.
I've looked at QPY but haven't used it much, mainly
because I thought
you said it wasn't a good idea to mix PTL and QPY in the
same program,
and Quixote uses
htmltext internally. One of my goals for my Quixote
refactoring
(which I haven't made any headway in; been working on SQL
import
stuff) was to make a Quixote that use QPY instead of
htmltext/PTL, but
that's not done yet.
I started wrote the program in PTL like my previous one,
then my
project manager said he really wanted to be able to tweak
the HTML and
preview it in a browser, so I switched to Cheetah, and just
converted
the last PTL template last week. But I have a library of
small HTML
functions that still uses PTL.
--
Mike Orr <sluggoster gmail.com>
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-11 09:54:06 |
On May 10, 2006, at 11:59 PM, Mike Orr wrote:
> I've looked at QPY but haven't used it much, mainly
because I thought
> you said it wasn't a good idea to mix PTL and QPY in
the same program,
That's correct. PTL and QPY in the same program causes
trouble because
the two htmltext types do not recognize one another.
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-13 14:51:34 |
Hi.
In article
<6e9196d20605101415v4a435a41r7a3e226a27b5314d mail.gmail.com>,
"Mike Orr" <sluggoster gmail.com> writes:
sluggoster> I tried
sys.setdefaultencoding("latin1") but that has to
be done in
sluggoster> the 'site' module; it's not available
within a program.
I'm also using Quixote-2 for my Japanese(UTF-8)
application. Here is
a kludge I'm using to set default encoding without
modifying global
site module:
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
It can't be appropriate, but it works perfectly for getting
rid of
annoying 'ascii' codec error.
-- kayama
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
| Htmltext and latin-1 characters |

|
2006-05-15 11:52:44 |
In article <20060513.235134.74684306.kayama st.rim.or.jp>,
Akihiro KAYAMA <kayama st.rim.or.jp> writes:
kayama> I'm also using Quixote-2 for my Japanese(UTF-8)
application. Here is
kayama> a kludge I'm using to set default encoding
without modifying global
kayama> site module:
kayama>
kayama> import sys
kayama> reload(sys)
kayama> sys.setdefaultencoding("utf-8")
kayama>
kayama> It can't be appropriate, but it works perfectly
for getting rid of
kayama> annoying 'ascii' codec error.
In addition, today I tested the application on Python-2.4.2
+ Quixote-2.4
and found that I could get rid of sys.setdefaultencoding().
When I wrote it last year, both Python-2.3 and Quixote-2.0
had
mysterious Unicode related problems such as using % operator
so I gave
up and decided to change default encoding simply.
Improvement in
Unicode support of current version of both softwares seems
to resolve
such problems. (CHANGES told me it was done at Quixote-2.2)
Perhaps, implicit conversion between unicode and str is
evil, or default
encoding should not be site global.
In article
<6e9196d20605131024q51c78338jaafbf776ec9d8c6f mail.gmail.com>,
"Mike Orr" <sluggoster gmail.com> writes:
sluggoster> You're not supposed to reload builtin
modules.
Exactly. I'm happy I can do what I am supposed to at last.
Thanks for
everyone who contribute these excelent products.
-- kayama
_______________________________________________
Quixote-users mailing list
Quixote-users mems-exchange.org
http://mail.mems-exchange.org/mailman/listinfo/quixo
te-users
|
|
[1-10]
|
|