On Thu, 2007-07-12 at 19:55 -0500, Michael Boudreau wrote:
> Thanks! That does what I need, except...
>
> My experience doesn't quite match what the FAQ says to
expect. Using Perl
> 5.6.0:
>
> use utf8;
> s/([x-x])/'&#' . ord($1) . ';'/gse;
>
> Produces:
>
> In XML input: Output after regex:
> ™ => ™ [trademark
symbol]
> é => é [lowercase e
with acute accent]
>
>
> use utf8; # [note the FAQ says this is not required
with 5.6]
It's not required with 5.8. It is required with 5.6.
> s/([^x20-x7F])/'&#' . ord($1) . ';'/gse;
Sorry, I keep forgetting to update the FAQ you probably
really want:
s/([^x00-x7F])/'&#' . ord($1) . ';'/gse;
Otherwise it does all your CR, LF and Tab characters too.
> Produces:
>
> In XML input: Output after regex:
> ™ => ™
> é => é
>
> But leaving out 'use utf8'; and still using the
second regex:
>
> In XML input: Output after regex:
> ™ =>
â„¢
> é => é
Here's a short test script that demonstrates the regex
working in 5.8
without the 'use utf8' line:
#!/usr/bin/perl
require 5.008;
use strict;
use warnings;
my $string = "TM: x";
$string =~ s/([^x00-x7F])/'&#' . ord($1) . ';'/gse;
print $string, "n";
which outputs:
TM: ™
Cheers
Grant
_______________________________________________
Perl-XML mailing list
Perl-XML listserv.ActiveState.com
To unsubscribe: http:/
/listserv.ActiveState.com/mailman/mysubs
|