List Info

Thread: UnicodeString encoding weirdness




UnicodeString encoding weirdness
country flaguser name
United States
2007-10-22 03:36:27
Issue status update for 
http://smalltalk.gn
u.org/node/113
Post a follow up: 
htt
p://smalltalk.gnu.org/project/comments/add/113

 Project:      GNU Smalltalk
 Version:      <none>
 Component:    Base classes
 Category:     bug reports
 Priority:     normal
 Assigned to:  Unassigned
 Reported by:  elmex
 Updated by:   elmex
 Status:       active
 Attachment:   http://smalltalk.gnu.org/files/issues/unitest2.st.txt
(849 bytes)

Take the attached program. Which prints here:

3
44
E3 <-> EF
81 <-> BF
AA <-> BE
E3 <-> E6
81 <-> A8
BE <-> B0
E3 <-> E7
81 <-> B8
9F <-> B0

But should print (at least as far as my understanding in
Unicode and encodings goes):

3
33
E3 <-> E3
81 <-> 81
AA <-> AA
E3 <-> E3
81 <-> 81
BE <-> BE
E3 <-> E3
81 <-> 81
9F <-> 9F




_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

UnicodeString encoding weirdness
country flaguser name
United States
2007-10-22 04:01:23
Issue status update for 
http://sma
lltalk.gnu.org/project/issue/113
Post a follow up: 
htt
p://smalltalk.gnu.org/project/comments/add/113

 Project:      GNU Smalltalk
 Version:      <none>
 Component:    Base classes
 Category:     bug reports
 Priority:     normal
 Assigned to:  Unassigned
 Reported by:  elmex
 Updated by:   bonzinip
 Status:       active
 Attachment:   http://smalltalk.gnu.org/files/issues/gst-encoding-
lazy.patch (594 bytes)

EF-BF-BE is the unicode "byte order mark" (BOM)
encoded in UTF-8.  It
was born as a way to distinguish big- and little-endian
UTF-16.  Since
it's not really a character, Iconv tries to strip it when
converting to
a UnicodeString, but it is failing to do so in this case.

Now, under Mac OS X I get the expected result, under Linux I
get yours.
 The reason is that my Mac is big-endian, so Iconv produces
big-endian
UTF-16, while Linux produces little-endian UTF-16.  Since
the default
encoding of UTF-16 is big-endian, the Mac happens to get the
right
thing, while Linux messes up the encoding.  So later on the
"pipe
peekFor: $<16rFEFF>" statement to strip the BOM
does not work.

The attached patch fixes this by making EncodedString look
for a BOM
when retrieving the encoding, rather than when setting it.




_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

UnicodeString encoding weirdness
country flaguser name
United States
2007-10-22 04:25:21
Issue status update for 
http://sma
lltalk.gnu.org/project/issue/113
Post a follow up: 
htt
p://smalltalk.gnu.org/project/comments/add/113

 Project:      GNU Smalltalk
 Version:      <none>
 Component:    Base classes
 Category:     bug reports
 Priority:     normal
-Assigned to:  Unassigned
+Assigned to:  bonzinip
 Reported by:  elmex
 Updated by:   bonzinip
-Status:       active
+Status:       fixed

fixed in patch-612, which is the same patch I posted plus
this testcase

  str := EncodedString fromString: (String new: 2) encoding:
'UTF-16'.
  str valueAt: 1 put: 254; valueAt: 2 put: 255.
  self assert: str numberOfCharacters = 0.
  str valueAt: 1 put: 255; valueAt: 2 put: 254.
  self assert: str numberOfCharacters = 0

Thanks!




_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

Re: UnicodeString encoding weirdness
user name
2007-10-22 04:51:09
On Mon, Oct 22, 2007 at 02:01:23AM -0700, Paolo Bonzini
wrote:
> Issue status update for 
> http://sma
lltalk.gnu.org/project/issue/113
>
[.snip.]
>
> The attached patch fixes this by making EncodedString
look for a BOM
> when retrieving the encoding, rather than when setting
it.

Thanks it works now!

I hope you don't mind me filing so many bugreports  I've been
working
on my chat implementation which uses JSON recently and I'm
eager to
support Unicode.


Robin


_______________________________________________
help-smalltalk mailing list
help-smalltalkgnu.org

http://lists.gnu.org/mailman/listinfo/help-smalltalk

[1-4]

about | contact  Other archives ( Real Estate discussion Medical topics )