Sungjin Chun wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> When I run following:
>
> (I18N.EncodedStream encoding: (UnicodeString
fromString: '전성진'))
> contents !
>
> gst emits endless messages related to garbage
collecting then crashes
> with segmentation faults.
Yes, it is a stupid bug. When using the system function
iconv, gst has
to split the UnicodeCharacters back into 8-bit Characters,
and here it
gets stuck in an infinite loop. The first character for
example is
$<16rC804>, and the "C8" byte is created as
a UnicodeCharacter rather
than a Character. This causes a recursive creation of
another
I18N.EncodedStream.
The attached patch fixes the bug; thanks for reporting it.
In my testing, I only used Eastern-European characters where
all bytes
are < 0x80.
> And, are there any simple example for processing UTF-8
encoded string?
>
Can you expand?
Paolo
--- orig/i18n/Sets.st
+++ mod/i18n/Sets.st
 -718,13
+718,13  next
been extracted."
wch := answer := self nextInput codePoint.
wch := (wch bitShift: -8) + 16r1000000.
- ^(answer bitAnd: 255) asCharacter
+ ^Character value: (answer bitAnd: 255)
].
"Answer any other byte"
answer := wch bitAnd: 255.
wch := wch bitShift: -8.
- ^answer asCharacter
+ ^Character value: answer
!
flush
 -754,7
+754,7  next
wch := answer := self nextInput codePoint.
wch := wch bitAnd: 16rFFFFFF.
count := 3.
- ^(answer bitShift: -24) asCharacter
+ ^Character value: (answer bitShift: -24)
].
"Answer any other byte. We keep things so that
the byte we answer
 -763,7
+763,7  next
wch := wch bitAnd: 16rFFFF.
wch := wch bitShift: 8.
count := count - 1.
- ^answer asCharacter
+ ^Character value: answer
!
flush
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|