List Info

Thread: Re: TT and UNICODE: Garbled special characters




Re: TT and UNICODE: Garbled special characters
country flaguser name
United Kingdom
2007-09-07 10:00:39
Stefan Kühn wrote:
>    GERMAN UMLAUT HERE: ___xFCxFCxFC___
>   
AFAIK, single-byte-width xxx escapes are always treated as
bytes, not
as characters. Even if they are outside the 7-bit range, and
even in the
presence of the utf8 pragma.

Try inserting real Unicode characters into the string,
explicitly
upgrading the string using utf8::upgrade or utf8 or use
encoding 'latin1'.

Matt


_______________________________________________
List: Catalystlists.rawmode.org
Listinfo: ht
tp://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-
archive.com/catalystlists.rawmode.org/
Dev site: http://dev.catalyst.per
l.org/

Re: TT and UNICODE: Garbled special characters
country flaguser name
United Kingdom
2007-09-07 10:33:21
Matt Lawrence wrote:
> Stefan Kühn wrote:
>   
>>    GERMAN UMLAUT HERE: ___xFCxFCxFC___
>>   
>>     
> AFAIK, single-byte-width xxx escapes are always
treated as bytes, not
> as characters. Even if they are outside the 7-bit
range, and even in the
> presence of the utf8 pragma.
>
> Try inserting real Unicode characters into the string,
explicitly
> upgrading the string using utf8::upgrade or utf8 or use
encoding 'latin1'.
>   
Oops, that last paragraph wasn't very clear, and
utf8::upgrade was not a
good suggestion. I'll try again:

#Option 1
use utf8; # recognise unicode characters in program text
my $name = "Stefan Kühn"; # use a real UTF-8
character here!

# Option 2
use Encode qw( decode );
my $name = decode("latin-1", "Stefan
Kxfchn");

# Option 3
use encoding 'latin1';
my $name = "Stefan Kxfchn";

Once you have a unicode string that's internally marked as
such,
C::P::Unicode should do the right thing with it.

Matt


_______________________________________________
List: Catalystlists.rawmode.org
Listinfo: ht
tp://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-
archive.com/catalystlists.rawmode.org/
Dev site: http://dev.catalyst.per
l.org/

Re: TT and UNICODE: Garbled special characters
user name
2007-09-07 10:52:01
On 9/7/07, Matt Lawrence <matt.lawrenceymogen.net> wrote:
> AFAIK, single-byte-width xxx escapes are always
treated as bytes, not
> as characters. Even if they are outside the 7-bit
range, and even in the
> presence of the utf8 pragma.
>
> Try inserting real Unicode characters into the string,
explicitly
> upgrading the string using utf8::upgrade or utf8 or use
encoding 'latin1'.

This was a good hint:
* __utf8::upgrade__ on the string worked
* __use utf8;__ worked too
* when using __use encoding..__, Activestate Perl crashed
So, the byte/character handling is the clue.
Thanks, Stefan

_______________________________________________
List: Catalystlists.rawmode.org
Listinfo: ht
tp://lists.rawmode.org/mailman/listinfo/catalyst
Searchable archive: http://www.mail-
archive.com/catalystlists.rawmode.org/
Dev site: http://dev.catalyst.per
l.org/

[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )