On Wed, Jan 25, 2006 at 12:37:54PM -0800, Tatsuhiko Miyagawa
wrote:
>
> This is not a direct answer to your question, but my
> Template::Provider::Encoding module will help your
situation. When you
> use it with Stash::ForceUTF8, you don't have to care
about UTF-8
> auto-upgrading problem.
Thanks. I like this idea as it forces the encoding to be
defined in
the templates. Might be nice to specify something other
than utf8 as
the default encoding if not encoding is specified in the
template,
though.
Since Template::Provider::Encoding calls
Template::Provider::_load
should it not check for the utf8 flag before trying to
decode it?
If a template had a BOM then returned data would already be
decoded.
The other option would be to set UNICODE => 0, but that
would not
handle the case of a scalar being passed in that was already
utf8.
Now, if there was a module that prevented people from
pasting from MS
Word.
Few comments/questions about TT's handling of encoding.
Please
correct me if I'm wrong about anything.
Template::Provider will attempt to determine the encoding by
BOM for
templates supplied by file name or a handle. Scalar refs
are not
touched, so they need to be correctly decoded before passed
to
process().
This BOM detection happens automatically for perl >
5.007.
There's a "UNICODE" option to provider. Thus,
this feature can be
disabled. It seems that this option is not documented
currently (in
my quick grep).
Obviously, you need an editor or some way to write the BOM
to all the
template files to use this feature.
Now:
- If a BOM is not found then the text is left alone. It
might be nice
to specify a default encoding so that if no BOM is found
then the
text is still decoded instead of left as raw data.
So, in my case I could specify cp1252 and if UTF8 is not
detected by
BOM then it is assumed that it's 1252 and then converted to
a perl
string.
- I also wonder if _decode_unicode should just return if the
input
text is already flagged as uft8. This would be useful when
supplying
a file handle that already has a PerlIO Layer set.
Currently if you
pass in a file handle with <:utf set you will get:
Cannot decode string with wide characters at
/usr/lib/perl/5.8/Encode.pm line 166, <$fh> chunk 1.
if the file also contains a BOM.
Oh, BTW. Isn't this suppose to be correct according to the
IO::File
docs?
$ perl -MIO::File -le
"IO::File->new('utf8.html',
'r')->binmode(':utf8')"
usage $fh->binmode([LAYER]) at -e line 1
This works, though:
binmode($fh, ':utf8')
--
Bill Moseley
moseley hank.org
_______________________________________________
templates mailing list
templates template-toolkit.org
http://lists.template-toolkit.org/mailman/listinfo/t
emplates
|