List Info

Thread: Re: Oniguruma and p




Re: Oniguruma and p
user name
2008-05-28 04:35:18
Hi,

In message "Re: Oniguruma and p"
    on Wed, 28 May 2008 18:10:24 +0900, ts <decouxmoulon.inra.fr> writes:

|Martin Duerst wrote:
|> This used to work, but the last time I checked
|> was several months ago. 
|
| I know nothing about this thing (encoding)
| but is this normal ?

It worked as we designed.  But this case, we lost script
encoding.
Not good.

							matz.


Re: Oniguruma and \p
user name
2008-05-28 08:30:00
On May 28, 2008, at 4:35 AM, Yukihiro Matsumoto wrote:

> It worked as we designed.  But this case, we lost
script encoding.
> Not good.

Rather than using script encoding, perhaps a regular
expression that's  
US-ASCII should adopt the encoding of the string it is
matching, so I  
could do

    utf_string =~ /p/
    euc_string =~ /p/

Dave


Re: Oniguruma and \p
country flaguser name
Japan
2008-05-31 07:40:24
[written before Matz's additional answer]

At 18:35 08/05/28, Yukihiro Matsumoto wrote:
>Hi,
>
>In message "Re: Oniguruma and p"
>    on Wed, 28 May 2008 18:10:24 +0900, ts
<decouxmoulon.inra.fr> writes:
>
>|Martin Duerst wrote:
>|> This used to work, but the last time I checked
>|> was several months ago. 
>|
>| I know nothing about this thing (encoding)
>| but is this normal ?
>
>It worked as we designed.  But this case, we lost script
encoding.
>Not good.

I'm not sure this is a script encoding problem.

The string is in UTF-8. The regexp is, on the surface,
ASCII-only.
But in meaning, it contains something more than ASCII.
Wouldn't it work if an ASCII regexp applied to some
'more-than-just-ASCII'
string, would automatically be upgraded to the encoding of
that string?

Or even moret, aren't things such as p independent
of the
encoding of the regexp? Even in a script written in EUC-JP,
I can immagine having a UTF-8 string, and doing /p
should
try to find Greek *as encoding in the string*, not as
encoded in
the regexp. This may be easy to do or not depending on the
actual implementation, but in terms of object-oriented
thinking,
it definitely should work: The regexp asks the string, at
the
right point: do you have a Greek character here. It's up to
the string to know (or not) what a Greek character is.
This is different from matching of literally encoded
characters,
because there the implementation matches the bytes (while,
for
some encodings, it has to also take care of character
boundaries).

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama
Gakuin University
#-#-#  http://www.sw.it.aoyama
.ac.jp       mailto:duerstit.aoyama.ac.jp     



[1-3]

about | contact  Other archives ( Real Estate discussion Medical topics )