Hi,
For me, string should not be limited to collection of single
byte
characters. String is string not a simple collection of
byte, isn't it? I
think squeak's approach (or OpenStep's approach, where
abstract public
string class and concrete private subclasses of string that
implements
several cases of string). But I'm not currently working
hard on GNU
Smalltalk, this may not be the best idea for GNU
Smalltalk's case
PS)
I DO think that strlen is not for unicode(actually
multi-byte encoded case)
string and is bad design: limited to single byte encoding. I
DO think that
modern language should consider unicode like string. I DO
think Smalltalk is
MODERN
----- Original Message -----
From: "Paolo Bonzini" <paolo.bonzini lu.unisi.ch>
To: "Chun Sungjin" <chunsj embian.com>
Cc: "GNU Smalltalk" <help-smalltalk gnu.org>
Sent: Friday, July 07, 2006 6:17 PM
Subject: Re: {Spam?} Re: [Help-smalltalk] [Q] Unicode
String?
> Chun Sungjin wrote:
> > Hi,
> >
> > main problem is that for example, if I did create
an instance of
> > string like this;
> >
> > a := 'Some MultiByte Encoded String'.
> >
> > then
> >
> > a size
> >
> > does not answer correct length of string.
> Well, strlen does not in C, too. You need mbrlen, and
#size is more
> like strlen than mbrlen.
>
> Also, the result heavily depends on the chosen
character set. If we
> want to have #utf8Size, that's fine. But #size should
be the number of
> *bytes*, not of characters.
>
> I'm seeing now if I can add an EncodedStream method
that extracts
> Unicode characters. Then what you wanted would be
something like
>
> (EncodedStream wordsOn: 'some string') contents
size
>
> for which, of course, we can add a utility method.
>
> Paolo
>
_______________________________________________
help-smalltalk mailing list
help-smalltalk gnu.org
http://lists.gnu.org/mailman/listinfo/help-smalltalk
|